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Summary 

The  objective  of  the  proposed  research  is  to  develop  reliable  algorithms  t  hat  can  achieve 
aggressive  image  data  compression  (with  a  compression  ratio  of  60  times  or  more)  for  real¬ 
time  implementation.  Typical  applications  of  such  algorithms  include  terrestrial  HDTV 
broadcasting,  space  communications,  and  handling  and  disposing  of  tcjcic  materials  and 
nuclear  wastes  with  remotely  controlled  robots.  The  state-of-the-art  techniques  are  ham¬ 
pered  by  serious  technical  barriers  of  codebook  design  complexity. 

The  proposed  approach  is  built  on  a  vector  quantization  (VQ)  algorithm  recently  de¬ 
veloped  by  the  PI.  The  codebook  design  complexity  of  this  VQ  algorithm  is  only  linearly 
proportional  to  the  codebook  size  (significantly  less  than  conventicjial  algorithms)  and  the 
encoding  complexity  is  independent  of  codebook  size.  Highlightiiig  the  proposed  app  oach 
is  a  piecewise-linear  transform  preceding  VQ  based  on  the  concept  of  entfopy  p  irtitioihng. 

The  novelty  of  the  proposed  algorithm  is  due  to  the  follcvv'ing;  (i)  introduction  of  a 
piecewise-linear  transform  to  VQ  so  as  to  retain  more  input  informalion;  (ii)  exp'oiiing 
both  inter-block  and  intra-block  redundancy;  (iii)  use  of  parallel  distributed  netwuk  for 
real-time  codebook  design. 

The  proposed  research  is  significant  as  (i)  it  addresses  the  imminent  demands  c  solving 
the  aforementioned  real-world  problems;  (ii)  its  accomplishment  will  alle'date  the  serious 
complexity  barrier  of  conventional  VQ  algorithms;  (iii)  it  pushes  forward  the  technical 
frontiers  of  data  compression. 
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I.  EXECUTIVE  SUMMARY 


This  final  report  3ummari2es  accomplishments  and  overall  progress  made  during 
the  period  of  April  1, 1989  to  March  31, 1990  for  the  research  project  sponsored  by  the 
Office  of  Naval  Research  under  Contract  N00014-89-J-1788.  It  enhsts  publications  and 
Theses  and  Dissertations  that  have  been  sponsored,  in  part,  by  this  research  project. 
A  selected  subset  of  th<^  related  publications  is  also  included. 

The  objective  of  this  research  project  is  to  perform  fundamental  studies  on  the 
theory  of  selective  update  in  signal  and  image  processing.  The  approach  is  based  on 
selective  use  of  input  data  in  retrieving  information  of  the  underlying  signals.  The 
selection  will  be  based  on  the  information  content  of  the  incoming  data.  Of  particular 
interests  here  are  parameter  estimation  in  adaptive  systems.  The  significance  of  this 
research  project  is  that  it  is  a  timely  response  to  the  demand  of  higher  level  of  machine 
automation  and  man-machine  interaction. 

Over  the  last  few  decades,  much  endeavor  has  been  made  on  improving  the  effec¬ 
tiveness  of  data  processing,  particularly  on  integrated  circuits  technology.  The  advent 
of  very  large  scale  integrated  (VLSI)  circuits  technology  has  made  available  fast  and 
high  density  circuitry  devices  at  lower  costs.  Processing  of  large  volumes  of  data  in 
real  time  has  thus  become  more  feasible  and  cost-effective  in  practice.  As  such,  mod¬ 
ern  signal  and  image  processing  calls  for  algorithms  that  are  compatible  with  such 
technological  advances.  In  particular,  algorithms  which  can  be  implemented  with 
higher  degrees  of  modularity,  concurrency,  and  higher  levels  of  machine  intelligence, 
thereby  providing  higher  data-throughput  rates,  are  more  appealing  in  practice. 

Most,  if  not  all,  of  the  efforts  have  been  focused  on  the  improvements  of  general 
computational  capabilities  or  the  architectures  of  maneuvering  arithmetic  operations. 
One  critical  issue  which  has  often  been  overlooked  is  the  extent  of  intelligence  incor¬ 
porated  in  the  algorithms  implemented.  In  particular,  selective  use  of  the  inpuf  data 
to  improve  the  efficiency  of  information  retrieval  is  as  critical  as  improving  the  speed 
of  simple  arithmetic  operations.  .A.n  essential  reason  for  the  selective  use  of  input 
data  is  that  it  eliminates  redundant  processing,  thus  could  improve  significantly  the 
potential  of  modular  concurrent  processing.  It  also  incorporates  a  Jecision-making 
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procedure  in  the  selection  of  data,  thus  enhances  the  level  and  capability  of  machine 
automation. 

This  research  project  concentrates  on  the  context  of  adaptive  signal  processing 
in  studying  selective  use  of  information.  The  ground  work  upon  which  this  research 
project  rests  is  a  set  of  recursive  parameter  estimation  algorithms,  i.e.,  the  so-called 
OBE  algorithms,  which  feature  a  discerning  update  strategy.  This  discerning  update 
is  in  sharp  contrast  to  the  continual  update  used  by  most  existing  algorithms. 

The  OBE  algorithms  belong  to  a  class  of  parameter  estimation/identification  al¬ 
gorithms  termed  Set-Membership  (SM)  algorithms.  The  SM  algorithms  use  certain 
set-theoretic  type  of  a  priori  knowledge  about  the  underlying  model  to  constrain 
the  solutions  to  a  certain  set.  In  particular,  the  disturbance  and  the  input  signals 
are  assumed  to  be  bounded  in  some  sense.  The  OBE  algorithms  are,  perhaps,  the 
most  viable  SM  estimation  technique  in  terms  of  analytical  tractability  and  practical 
appealingness. 

The  emerging  field  of  SM-based  signal  processing  has  received  considerable  at¬ 
tention  and  is  becoming  increasingly  popular  in  the  research  community  around  the 
world.  Many  special  sessions  at  professional  conferences  have  been  organized  and 
special  issues  in  professional  journals  have  been  published.  It  is  clear  that  researchers 
around  the  world  are  excited  about  the  tremendous  potential  of  SM-based  algorithms 
for  applications  to  problems  of  practical  importance.  To  name  a  few  applications, 
time  series  analysis,  spectrum  estimation,  speech  and  image  enhancement/processing, 
biology  and  chemistry,  and  pharmacokinetics  are  among  the  more  notable  ones. 

One  of  the  striking  features  of  recursive  SM-based  algorithms,  thus  OBE  algo¬ 
rithms,  is  a  discerning  update  strategy  for  the  parameter  estimates.  An  important 
outcome  of  such  discerning  updates  is  that  the  resulting  algorithm  can  be  implemented 
with  two  modules:  an  information  processor  followed  by  an  updating  processor.  The 
former  decides  whether  an  update  is  needed,  and  the  decision  is  based  on  the  evalua¬ 
tion  of  the  "information  quality”  of  the  input  data,  the  prediclion  error,  and  the  noise 
bound.  It  is  essential  that  the  information  evaluation  involves  very  little  computa¬ 
tional  effort,  which  is  the  case  here.  The  tatter  then  updates  the  parameter  estimates 
when  the  information  processor  decides  that  such  is  needed. 
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Simulations  on  the  OBE  algorithms  have  shown,  in  general,  that  only  less  than 
20%  of  the  input  data  are  used  to  update  the  parameter  estimates.  This  is  true 
for  most  practical  systems  that  can  be  modeled  by  autoregressive  processes  with 
sxogeneous  inputs  (ARX)  or  autoregressive  moving  average  (ARMA)  processes  whose 
<)rder  is  less  than  ten. 

Conceptually,  thanks  to  the  modularity  and  to  the  fact  that  only  less  than  20% 
of  input  data  are  used  to  update  the  parameter  estimates,  an  adaptive  signal  pro¬ 
cessing  network  may  be  constructed.  The  network  will  consist  of  a  number  of  such 
modular  recursive  estimators,  each  of  which  is  comprised  of  two  modules,  namely, 
the  information  evaluator  and  the  updating  processor.  As  such,  idle  time  of  both 
the  information  evaluator  and  the  updating  processor  can  be  reduced,  thus  the  data 
throughput  rate  will  be  increased.  In  addition,  the  reliability  of  signal  processing  can 
be  improved  greatly.  In  essence,  this  type  of  adaptive  networks  will  be  able  to  pro¬ 
cess  multi-channel  adaptation  and  filtering^  improving  reliability  and  data  throughput 
rates.  One  of  the  important  applications  for  this  is  adaptive  array  processing  in  sonar 
systems. 

In  this  project,  several  fundamental  issues  associated  with  this  kind  of  estimation 
algorithms  are  investigated.  To  begin  with,  investigations  are  conducted  to  extend 
one  of  the  OBE  algorithms  to  the  estimation  of  parameters  of  autoregressive-moving- 
average  (ARMA)  processes.  The  resulting  algorithm  is  referred  to  as  the  EOBE 
algorithm.  The  ARMA  process  has  been  used  to  model  signals  encountered  in  un¬ 
derwater  array  signal  processing. 

Among  others,  the  issue  of  convergence  for  ARMA  parameter  estimation  is  of 
critical  importance  to  practical  implementation.  We  showed  that  if  the  input  noise 
is  bounded  in  magnitude  and  the  moving  average  parameters  satisfy  a  certain  mag¬ 
nitude  bound,  then  the  a  posteriori  prediction  errors  are  uniformly  bounded.  Wi  h 
an  additional  persistence  of  excitation  condition,  the  parameter  estimates  are  shown 
to  converge  to  a  neighborhood  of  the  true  parameters,  and  the  a  priori  prediction  er¬ 
rors  are  asymptotically  bounded.  In  contrast,  the  conventional  algorithm  of  extended 
least-squares  requires  the  strictly  positive  real  (SPR)  condition  to  assure  convergence. 

It  is  worth  mentioning  that  an  important  virtue  of  this  EOBE  algorithm  is  that. 
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under  rather  mild  conditions,  the  bounding  ellipsoids  always  contain  the  true  param¬ 
eter,  providing  a  100%  confidence  region  for  the  true  parameter.  This  is  a  feature  not 
shared  by  other  conventional  algorithms  which  only  guarantee  that  asymptotically. 

Implem.entation  on  finite  word-length  processors  is  almost  a  mandate  for  all  sig¬ 
nal  processing  algorithms.  We  investigated  the  OBE  algorithms’  performance  in  finite 
word-length  environment  via  simulations.  In  particular,  the  effects  of  roundoff  error 
accumulation  and  numerical  stability  were  studied  with  fixed  point  simulations.  Anal¬ 
ysis  of  error  propagation  in  an  OBE  algorithm  is  also  performed  which  shows  that  the 
errors  in  the  estimates  due  to  an  initial  perturbation  are  bounded.  Based  on  these 
results,  we  showed  that  the  OBE  and  the  EOBE  appear  to  be  superior  to  the  RLS 
and  the  ELS,  respectively. 

One  of  the  possible  reasons  for  such  encouraging  results  is  the  discerning  update 
strategy  which  updates  parameter  estimates  less  frequently,  thereby  accumulates  less 
roundoff  errors.  Another  reason  is  imbedded  in  the  update  equations  which  may  re¬ 
quire  more  detailed  analysis.  Nevertheless,  these  results  further  verify  our  conjecture 
that  eliminating  redundant  use  of  information,  contained  in  the  received  data,  would 
reduce  the  effects  of  roundoff  errors. 

We  further  investigated  one  of  the  OBE  algorithms  in  terms  of  tracking  properties. 
Conditions  which  ensure  the  existence  of  these  100%  confidence  regions  in  the  face  of 
small  model  parameter  variations  are  derived.  For  larger  parameter  variations,  it  is 
shown  that  the  existence  of  the  100%  confidence  region  can  be  guaranteed  asymptot¬ 
ically.  A  modification  of  the  OBE  algorithm  wets  also  proposed  to  enable  tracking  of 
larger  variations.  Our  simulation  results  have  shown  that  the  modified  algorithm  has 
tracking  performance  comparable,  and  in  some  cases,  superior,  to  the  exponentially 
weighted  recursive  least-squares  algorithm. 

In  sunnnary,  our  studies  in  this  one-year  project  established  the  practical  viability 
of  estimation  algorithms  that  selectively  use  the  input  data. 
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"This  Thesis  is  being  included  in  this  final  report 
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Parameter  Estimation  Usina  a  Novel 
Reeureive  Estimation  Algorithm  with 
Selective  Updating 


ASHOK  K:  RAO.  YIH-EANG  HUANG,  member,  ieee,  \sd  SQURA  D.ASGUPTA 


.4i5(rac(.7;Thisrpapcr'iriyesiigatM  an  cxtensioh  of  a  recursive  esti* 
maiioii’al^prithra  (the  so-caiM  QBE  algorithih)491-jll|.  which, fea> 
tiires  a^disceming  update  strat^.  In  particular,  an  extension  of  the 
aigofithm  to  ARMA  parameter  Atimation  is  printed  here  along  with 
cqiiVergience  anaiysIs.  'The  exiensiqn  is-siihilar  to  the  extended  least- 
squares.algbrithih.':.HqweVer.:the  convergence  .analysis  is  complicated' 
'due.tq.the-discernihg  upda'tc,strat^,.which;incorporaies  an  infoima-. 
tiqn*dependent  updating  factor.  The  virtues  of  such  an  update  strategy' 
are;:  11  more.efficient'.iise  of  the'ihput  data  in  terms  <if  iriformation 
iprbeessiri^i  and  2)  a  modular  adaptive  filter  structure  w  hich  would  facil* 
'itaU'.the;developmenfnf;a'parallei*pipclined  signal. processing  arebi*’ 
tecture.-it  iyshown  in  this.paj^'r  that  iftlie'input  noise  is  bounded  and 
|jthe  moving  average  parameters  satisfy  a  ceitain  magnitude  bound,  tberi 
-.'the  a  postenoiri  prediction  erroriare.  uniformly  bounded.  With  an  ad: 
ditiqhali'.persist'ence  of  excitation  cqhditipn.  the.  parameter  estimates 
-are.shqwn  to  converge. to  a  lieighlMrhood  of  the  true  parameters,  and' 
thc^npnbrt  prediction  errorrare  shqwn  to  be  a-symptoticaily  bounded. 
Simulation:  resultstshow. that  :the. parameter  estimation  error  for^he- 
EPBE'algbrithm.is.cqniparable.to  that  fqr  the.ELS  algorithm. 


li  iNTROpUCTipN 

IN\rnany.  adaptive  signal ‘processing  applications,  such 
as  speech  procesring,  seismic  data  procesring.  and 
.channel, equaiizatiph;, a  signal :y(f)  Is  often  considered  as 
.the  output  of  ahtiIR:filtef  driyeh  by  unknown  white  noise 
u'(-f)i(.l,l;  The  signal  y (7)  can  thefefore.be  modeled  as  an 
aumfegressive  moving  average  (ARM A)  process  of  the 
form- 

y,(f)  =  «iy(7  -  l').-E  •  •  •  +  a„y{i  -  n)  +  w(t) 


tiohal  equation -method  .[2].  the  spectral  matching  tech¬ 
nique  of  Friedlandef  and  Pofat  [3].  and  the  extended 
Yule-Walkef  method  of  Kaveh  (4].  A  common  feature  of 
these  methods  is  the  use  of  the  sample  autpcofrelation  ser 
qiience  of  the  output  process  y(i).  In  the  context  of  sys¬ 
tem- identiheatioh.  the  extended  least-squares  (ELS),  the 
recuisive  maximum  likelihood  (RML),  and  -multistage 
least-squares  algorithms  have  been  used  to  recursively  es¬ 
timate  ARM  A  parariieters  [5],  [6]i  (121.  The  ELS  algoy 
rithm  uses-the  fl-poifcr/pff  predicticih  error  e ( f).  as.  an 
estimate  of  7v(f).  Th^  regressor  vector  is  formed  front 
y(f  -  i  ),  *  •  •  .  y(f  -  n)  and  efr  -  1 ),  •  •  •  .  €(f  -  f). 
The  standard  recursive  least-squares  (l^LS)  algorithni  is 
then  employed  to  update  the  estiihates.  The  algorithm  is 
conceptually  simplc-but  restrictive  in  the.  sense  that  con¬ 
vergence  of  the  algorithm  can  be  assureid.only  if  the  un¬ 
derlying,,  traitsfer  function //(<?."')  =  1/2 

-is  strictly  positive  real  . (SPR),  with;^"'  being,  the  delay 
operator  and 

C(q~')  =  1  -f  c,^"'  -r  Cj<7""  +  •  •  •  +  Crq~'.  (1.2.) 

the  RML  algorithm,  which  uses  a  filtered  yerripn  of 
the  regressor  vector  used-in  ihe  ELS  algorithni, -.does  hot 
require  )  to  be  SPR.  However,  the  estimates  have 
to  be  monitored  and  projected  into  a  stability  region  to 
ensure  convergence  [5). 

In  addition  to  the  aforementioned' least-squares  ba.sed 


+  C|U'U  “  :1 )  4"  =  ;  •  +  tvw(f  “  >')■  (l.T) 
Fitting  this  ARMA  model  to  the, measured-data  f  = 
1.2.  •  •  ‘  .  requires  the.eriirhatioh  of  the  parameters  <J|. 
•  •  •  .  c’j.  •  •  •  .  ty.  Many  methods,  for  the  estimation 
of  ARMA  parameters  Have  been  proposed  in  the  litera¬ 
ture.  pafticulafly  from  the  spectral  estirhation  viewpoint. 
' ,  Among  the  more  recent  are  Cadzow's  overdetermined  ra- 

.Mami-icripi  received  \pril  30.  1088.  revived  May  4  1080  fliK  wurl. 
vyas  -.upported  in  part  bv'  the  National  Seience  Foundation  under  Cirant  MIP- 
,S7-iH"4;  in  pan  by  ihe  Ollice  qi  Naval  Re.veareh  under  Contract  .N0OOI4- 
87.k.u2S4.  and  tn  pan  bv  the  Nattonal  Science  Foundation  under  Grant 
ECS-8bl8240. 

K.  R.io  iv  vvith  COMS.AT  Laboratories.  Clarksburs:.  MD  20871 
Y.-F.  Hiiane  is  with  the, Depannient  ot  F.lectrical  and  Computer  F,nai- 
neerine.  Lniversity  ot-Noire  Dame.  Notre  Dame.  IN  4()5J(' 

.S.  Davaupta  iv  venh  the  Depanment  of  Electrical  and  Ci'inpuier  Fnui-. 
neerina.  Lniversity  oi  Iowa.  Iowa  City.  lA  .82242 
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methods,  there  exists  a  different  class  of  estirhation  al¬ 
gorithms  that  estimate  membership  sets  of  parameters 
which  are  consistent  with  the  measuremehts  and  noise 
constraints  l7]-^[  11).  These  algorithms  are  particularly 
useful  wheh  the  noise  distribution  is  unknown  but  con¬ 
straints  in  the  form  of  bounds  on  the  instantaneous  values 
of  the  noise  are  available.  To  the  best  of  our  knowledge, 
none  of  the  algorithms  has  been  applied  to  the  problem  of 
ARMA  parafnerer  estimation.  .-Mnong  these  algorithms 
based  oh  membership  sets,  a  group  of  seminal  recursive 
algorithms  are  the  so-called  optimal  bounding  ellipsoid 
((5bE)  algorithms  19J-|11].  The  QBE  algorithms  have 
been  developed  using  a  set-theoretic  formulation  and  are 
applicable  to  autoregre.ssive  with  exogenous  input  (ARX) 
models  with  bounded  noise.  One  of  the  main  features  of 
these  temporally  recursive  algorithms  is  a  discerning  up¬ 
date  strategy.  This  feature,  obtained  by  the  introduction 
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01  an  ihipniiation  ucbenderii  lipaating:  forceniP.a  lacior. 
yields  a  modular  siruciufe  thereby  increasing  the  putentiai 
tor  concurrent  and  .  pipelined  processing  of  signals.  The 
presence  of  such  a  forgetting  factor  also  gives  the  algo¬ 
rithms  the  ability  tp  irack  slowiy  time  varying  parameters. 
One  of  the  algorithms  [  1 1)  has  been  shown  to  possess  the 
advantageous  feature  of  automatic  asymptotic  cessation 
of  updates  if  thefmodel  is  time  invariant.  If  a  loose  upper 
boiind  on  the  noise  magnitudc.is  known,  and  if  the  input 
is  persistently  exciting  and  sufficiently  uncorrelated  with 
the-.noise,  then  it  has  been  shown  in  [1 1]  that  the  pararh- 
^ ':ier  esiimates  converge  asymptotically  to  a  neighborhood 
of  the  tfue  parameter  vector. 

In  this.papef,  we  extend  one  of  the  OBE  algorithms  (11) 
to  the. ARMA  case.  Forthe  ARM Aiparameter  estimation 
problem,  the  OBE  algorithm  cannot  be.applied  ih  its  pres- 
ent'fopi.  Howevetj  by  assuming^that  the  input  white  noise 
is  bounded  in  magnitude,  the  OBE  algorithm  can  be  e.x- 
tended  in  a, manner  similar  to  the  ELS  algorithm.  Gon-.^ 
-vergence:  analysis,  of  the  resulting  algorithm  is  performed 
by  imposing  a  bound  oh  the  sum  of  the  magnitudes  of  the 
MA  cpefficiems.  This  ensures  that  the  true  parameter  vec¬ 
tor  is  contained  in  all  the  optimal  bounding  ellipsoids.  A 
unifp.mi-  bound  on  the  q  posteriori  prediction  error  can 
then  be  derived.  In  contrast,  even  though  the  a  posteriori 
prediction  errors  are  generated  in  a  stable  fashion  in  the 
ELS  algorithm  (5],  it  is  difficult  to  obtain  an  expression 
foreven.the  asymptotic  bound,  if  such  a  bound  exists.  By 
imposing  a  persistence  of  excitation  condition  on  the.re- 
gressor  vector,,  the  a  priori  prediction  error  of  the  ex¬ 
tended  OBE  algorithm  is  shown  to  be  bounded  and  the 
parameter  estimates:  are  shown  to  converge  to  a  neighbor¬ 
hood  of  the  true  parameter  vector. 

The  paper  is  organized  in  the  following  manner.  In'Sec- 
tion  II.  a  brief  review  of  the  OBE  algorithm  and  its  prop¬ 
erties  is  .presented.  In  Section  III.  the  algorithm  is  ex¬ 
tended  to  ARMA  parameter  estimation.  Convergence 
analysis  of  the  extended  algorithm  is  performed  in  Section 
IV.  The  performance  of  the^algorithm  is  compared  to  the 
ELS  algorithm  through  simulation  studies  in  Section  V. 
Section  VI  concludes  the’ paper. 

II.  The  OBE  Algorithm 

Consider  the  ARX  model  described  by 

y(r)  •=  «,y(/  -  1.)  -i-  •  •  •  -r  ii„y(t  -  /i)  -r  bou(t) 

-  b,ll{t  -  1)  -r  ■  ■  ■  -  b,„ii(t  -  m)  -r  lit) 

Avhere  y(f)  is  the  output.  u(t)  i.s  the  measurable  input, 
and  i'(i)  repre.sents  the  uncenainty  or  noise.  The  above 
equation  can  be  recast  as 

v(/)  =  r''<I>(/)  -  (2.1) 

where 

/>„.  b,.  ■  ■  ■ .  b„y 


.nc  ■  i.ior  :  ir:.e  raratr.eiers  ..nu 

1 1  =  ; ;  I -  i  I .  I  r  2 1 .  ■  -  -  . .  '  <  -  •/ 1 . 

.HI).  Hit  -  i  I .  •  •  •  .  .7 ( /  -  .‘H I j. 

!b  tne  regressor  vector.  .A  .\ey  assumption  iiere  is  that  the 
iioise  IS  Dounded  m  magnitude,  i.e..  ..lere  exists  a  > 
0.  such  that 

/'■(r)  <  -,o.  for  all  r.  hence. 

(yu)  -  <  yi 


I 
I 

=  [^:  (y(r)  -  <  y-.  0  e/f"-"'-'). 

From  a  geometric  point  of  view.  S,  is  a  convex, polytopeB 

tntA 


Let  S,  be  a  subset  of  the  euclidean  space  R" 
by 


defined 


ih  the  parameter  space  and  contains  the  vector  ot  true  pa¬ 
rameters.  The  OBE  algorithm  stans  off  with  a  large  el 
iipsoid.  £,)•  )n  R"~"‘  * '  which  contains  all  possible  values] 
of  the  modeled  parameter  6*.  .After  the  first  observation 
y(  1 )  IS  acquired,  an  ellipsoid  is  found:which  bounds  the 
intersection  of  Eo  ‘‘id  the  convex  poiytope  5i.  This  el-« 
Itpsoid  must  be  optimal  in  some  sense,  say  minimum  vol-J 
lime  [9],  [10]  or  by  any  other  criterion  (9),  [11];  to  hasten 
convergence.  Denoting  the  optimal  ellipsoid  by  E|,  one- 
can  proceed  exactly  as  before  with  the  future  observation* • 
and  obtain  a  sequence  of  optimal  bounding  ellipsoids® 
{  £,  },.  The  center  of  the  ellipsoid  E,  can  be  taken  as  the 
parameter  estimate  at  the  nh  instant  and  is  denoted  byB 
d.{t).  If  at  a  particular  time  instant  /,  the  resulting  optimap 
bounding  ellipsoid  would  be  of  a  “smaller  size."  thereby 
implying  that  the  data  poiiit.y  ( / )  conveys  some  fresh  “in-* 
formation"  regarding  the  parameter  estimates,  then  th« 
parameters  are  updated.  Otherwise.  E,  is  set  equal  to  E, .  j 
and  the  parameters  are  not  updated.  It  can  also  be  shown 
111)  that  all  the  ellipsoids  { E,.  r  =  1.  2.  •  ■  •  }  contaiiB, 
the  true  parameter  provided  that  E,,  does.  ® 

Let  the  ellipsoid  E,_ ,  at  the  i  /  -  1  ith  instant  be  for 
mulated  by 

E,.,  =  {0:  (0  -  e{i  -  \))^P-Hl  -  I) 

•  (0  -  Oil  -  D)  <  ait  -  d) 

for  some  positive  definite  matrix  Pit  -  1 )  and  a  nonneg 
ative  scalar  a'ii  -  1 ).  Then,  given  y(/).  an  ellipsoid 
which  bounds  E,  _i  (1  S,  "tightly"  is 

{(?:(!-  X.)(0  -  fit  -  1))V*'(/  -  1 


I 

I 

d 

I 


■  1$  -  Oil  -  i ))  -  x.(y(n  -  //' <!>(;)) 
<  ( 1  -  X;)(7-(/  -  1 )  -  X.7.^} 


P 

I 


where  the  forgetting  factor  X(  n  satislies  0  <  X(  r )  <  1 
The  size  of  the  bounding  ellipsoid  is  related  to  the  .scala] 
oHt  -  1 )  and  the  eigenvalues  of  P(i  -  \).  The  update 
equations  forO(n.  Pit),  and  a’ii)  are  derived  in  (lip 
The  optimal  ellipsoid  which  bounds  the  intersection  « 


6*  =  («i.  (h.  ■  •  •  . 
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and  S.  is  detined.in  terms  of  an  optimal  value  oi  a.. 
'ForJhe  OBE  alapriihm.of  j[i  i].  the  optimum  value  a.’'  is 
determined  by. minimizing  ah  f;)  \vith  respect  to  at  every 
time  insmnt.  The  minimization  proc.eduTe  results  in  X.^ 
being  set  equal  to  zero  (no  update)  if 

uTr  -  ^  o-(-f.)  <  7.3- 

If  (2. 4)  is  not  satisfied,  then  the  optimal  value  of  X,  is 
computed.  The  parameter  estimation  procedure  is  de¬ 
picted-in  Fig.  I  .  An  outgrowth  of  this  modular  recursive 
estimation  procedure  is  a  parallel-pipelined  networking 
structure 13].  The  algorithm  is  such  that  the  computa¬ 
tional  complexity  of  the  information  evaluation  (IE)  pro¬ 
cedure  is  much  less  than  that  of  the  updating  procedure 
(UPD).  Since,  in  general,  a  good  number  of  data  samples 
would  be  rejected  by  the  IE.  both  the  IE  and  the  UPD 
would  involve  significant. amounts  of  idle  time.  A  viable 
scheme  thcr.-i®  to  configure  a  parallel-pipelined  network 
comprising  of  such  modular  estimators  to  process  signals 
from  multiple  channels.  Apart  from  reducing  hardware 
costs,  such  a  scheme  would  offer  increased  reliability 
since  the  failure  of  one  UPD  processor  would  not  cause 
any  of  the  channels  to  fail,  in  contrast  to  a  system  with  a 
dedicated  UPD  processor  for  each  channel. 

III.  Extension  to  ARMA  Models 

The  ARMA  model  described  by  (l.l)  can  be  rewritten 
as 

iv(r)  =  y(/)  -  (3.1) 

where  0*.  the  vector  of  true  parameters,  and  are 
defined  by 

Y 

6*  =  [a|,  uj,  •  ■  •  ,  a,„  C|,  ci.  •  ■  •  ,  Cf] 

,  <!>'(;)  =  [y(r  -  1).  •  ■  ■  .y(r  -/i). 

T 

w{t  -  1  ).  ■  •  ■  .  U’(f  -  /•)]  . 

Here  again.  w(r)  is  assumed  to  be  bounded  in  magnitude, 
i.e..  there  exists  positive  yq  such  that 

w'it)  <  7^  (3.2) 

Since  the  values  of  the  noise  sequence  {u'(f)}  are  not 
available,  the  regressor  vector  <I>'(f)  is  not  known  ex¬ 
actly.  If.  however,  at  time  r.  an  estimate  of  6*. 

Oil)  =  («,(r).  •  •  •  .  «„(r)  t',(/).  •  •  ■  .  cv(f)J^ 

(3.3) 

is  available,  wit)  could  be  estimated  by  the  a  posteriori 
prediction  error 

6(r)  =  y(r)  -  0^(r)  <!>(?)  (3.4) 

where 

<I>(r)  =  [y(r  -  1 ).  •  •  ■  .  y(f  -  /i). 

€(r  -  1 ).  •  ■  •  .  e(r  -  /■)] ^.  ( 3.5) 


.=C=VA'  ==CC=SiCn  .=DAT=  ==CC=SSC= 


~ecervec 


lafT^otef  5vaiua*40'' 


jocaie/ 
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New  — O  P 
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Pis.  !,  .\  modular  recursive  e^Iln1a^or. 


Now  just  as  in  the  ARX  case,  define  for  some  suitable  7' 
the  convex  polytope 

S.  =  {0:  (.v(0  -  0''^>(O)'  2  Oe 


and  the  bounding  ellipsoid 

E,=  {d6  R''"h  {0  -  e{r)fp-'(t){o  -  d{t)) 

<  a-(r)}. 

The  :  odate  equations  for  d(t),  Pit),  and  a’(f),  which 
then  follow  directly  from  [1 1],  are  as  in  the  ARX  case, 
with  the  only  difference  being  that  the  regressor  vector  is 
now  given  by  (3.5): 


p-\t)  =  (1  -  \)p-'it  -  1)  +  \Ht)<^^t) 
e(t)  =  0{i  -  1)  +  X,P(f)$(r)6(r) 

5(0  =  y(0  -  0"(f  -  l)-i>(0 
ff-(r)  =  (1  -  X,)cr-(r  -  1)  +  \i' 

X,(l  -X,)5^(r) 

1  —  X,  +  X,G(r) 

where 

Git)  =  ‘b^(r)  Pit  -  1)  4>(r). 


(3.6a) 

(3.6b) 

(3.6c) 


(3.6d) 


(3.6e) 


The  matrix  inversion  lemma  can  be  used  in  (3.6a)  to  ob¬ 
tain  the  following  recursion  for  Pit): 


Pit) 


\,Pit  -  !)»(/)  <I>0f)  Pit  -  1) 
1  —  X,  +  X,G(/) 


(3.6f) 


As  in  the  OBE  algorithm,  the  bounding  ellipsoids  are  op¬ 
timized  by  choosing  Xf  to  minimize  a'it).  In  order  to 
facilitate  the  subsequent  analysis,  the  initial  conditions  are 
modified  to 

P(0)  =  Mln-r-  (^(0)  =0.  and  a'(0)  =  7"  -  t 

(3.7) 

where  M  »  1.  e  «  1.  and  is  the  identity  matrix 
of  dimension  n  4-  r.  This  choice  of  initial  conditions  en¬ 
sures  that  the  initial  ellipsoid  £0  will  contain  the  true  pa¬ 
rameter  vector  0*  and.  more  imponantly.  as  shown  in  Ap¬ 
pendix  .A.  simplifies  the  optimum  forgetting  factor 
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-50  .  -  ' 

aetennihatiort-4prmuia'io 

if  c'(7  -  f )  -  .<  7' 

then  X?  =  '()., 

-  -  ■  othetXvise 

''  '  - 


■  \N'K 


i3,9a) 


-.1- 


-G(0. 


•i  -  G.{/jL  %1  -r  iS(r)(G(f)  -  !  )} 

if'G(7)-?t  1  (3:9b) 


where 


y-  -  a-(r  -  1 ) 
■  6-(M 


.3(f)- 


(3.9c) 


■Remarks: 

1)  Il  is  shown  im Appendix  A' that  if  a'K  -  ! )  -  o‘(/) 
■>.'7'.  iIienAf  given  by  (3.9)  sattsiies 


(/a '(f) 


(/X, 


=  0 


Xi  a 


and  furthermore,  0  <  Xf  <  1.  Thus,  unlike  (1 11.  no  up¬ 
per  bound  need  be  imposed  on  the  forgetting  factor. 

2)  Since  j'(r)  =  a'(t  ~  1)  if  Xf  =  0;  any  nonzero 
value  of  Xf  which  minimizes  if'd)  will  cause  a*(f)  < 
a‘(r  -  1).  Thus,  choosing  Xf  to  minimize  a’(r)  causes 
{ a'(f) }  to  be  a  nonincreasing  sequence. 

The  recursive  relations  (3,6),  the  initial  conditions 
(3.7),  the  .selective  update  strategy  (3.8).  and  the  forget¬ 
ting  factor  determination  formula  (3.9)  form  the  E.xtended 
Optimal  Bounding  Ellipsoid  tEOBE)  estimation  algo¬ 
rithm  (14),  The  choice  of  the  threshold  7'  will  become 
clear  from  the  analysis  below.  The  algorithm  retains  the 
discerning  update  strategy  and  the  modular  adaptive  filter 
structure  of  the  OBE  algorithm  111].  113). 

IV.  Analysis  of  the  EOBE  Algorithm 

The  main  difficulty  in  the  analysis  of  the  EOBE  algo¬ 
rithm  arises  from  the  presence  of  the  a  posteriori  predic¬ 
tion  errors  in  the  regressor  vector.  Unlike  the  OBE  algo¬ 
rithm.  in  this  case,  boundedness  of  u’tn  does  not 
guarantee  that  all  the  convex  polytopes  S,.  i  -  1.2. 
■  ■  •  .  will  contain  d*.  The  first  step  in  the  anaKsis  is  to 
Iind  conditions  under  which  this  happens.  The  minimi¬ 
zation  of  o'tn.  at  every  time  instant,  and  the  choice  of 
initial  conditions  i3.7).  facilitate  the  characterization  of 
the  behavior  of  the  a  posteriori  prediction  errors. 

Lemma  I.  For  the  EOBE  algorithm  of  Section  Hi.  if 
<;‘(r  -  i )  -r  o'tn  >  7  i.e..  if  an  update  occurs  at  time 

instant  t.  then 


i)  a'(f) 


:-(n  =  7-. 


•  4.1) 


-r,  ■'  .;hCH 


1. 


o  •  \1  \KCH 


I 


3.8) 


.1 1  <  ■’  i; )  <  •  “ !  1  or  aii  time  instants  k  <  t.  4  2) 

and-  if  f  -  I  Is  the  time  instant  at  which  the  next  update  I 
occurs,  then  ■ 


tii)  i'\k)  <  c'ln  for  ail  k  <  i  -r  /,  (4.3) 

Proof: 

1)  Jt  has  been  shown  in  .Appendix  A  that  if  aht  -  !  ) 
-  o‘{  / )  >  7".  then  the  optimum  forgetting  factor  Xf  sat¬ 
isfies 


da'jt) 

</X. 


=  0. 


(4.4; 


'  v-  =  .V 


Taking  the  derivative  in  (3,6d)  and  using  (4.4)  yields 

(1  -x,)6-(0 


7*  -  o'it  -  ! )  - 


1  -  X,  4-  X,G(0 
X.6'(r)  Git) 


(4.5a) 


I 

I 

I 

I 

I 


(4.5b) 


(  1  -  X.  -  X,C(/))' 
which  can  be  rewritten  in  the  form 

(1  -  X,  +  X;C(r))  I 

In  (4.5)  and  in  the  remainder  of  the  paper,  when  there  is  * 
no  risk  of  confusion,  the  optimum  forgetting  factor 
will  be  denoted  by  X,.  It  is  also  easily  shown  from  (3.6b).  I 
(3.6c).  and  (3.6f)  that  the  a  posteriori  and  a  priori  pre-  I 
diction  errors  are  related  by 


e(()  = 


1  -X, 


1  -  X,  -r  X,G(0 


5(0. 


(4.6) 


Note  that  the  nonnegativeness  of  Gir)  implies  that  e'tt) 
<  o4().  Sub.stituiing  i4  61  in  (4  5b)  and  rearranging 
terms  vields 


(  1  -  X,)7-  -  1  1  -  X.)a*U  -  1) 

.  ,  ,  X,-G(/)e-(/) 

=  (1  -  X.)e-(f)  -  - - r - . 

1  -  A, 

Now  using  (4.6)  in  (3.6d)  gives 

a'(0  =  I  I  -  X.Ja'lr  -1)4-  \,y- 
X:Git)  . 


(4.7) 


-  X.e-(r) 


1  -  X. 


e'U). 


14.8) 


Finally,  subtracting  i4.8)  from  (4.7)  gives  (4, 1 ), 

11)  Case  1.  It  k  <.  t  ii  an  updating  instant,  then  (4  1) 
uives 


^tA  )  -  e'(A)  =  7". 


(4.9) 


But  since  { a‘t  f ) }  is  a  nomncreasing  sequence.  (4.9)  and 
(4.1)  together  would  imply  that 

-■(A)  <  ^-(r). 
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Case  2:  it  k  <  ?  is  a  nonupciaung  insiant.  tnesi  -v  * 

=  o’fA').  and  so  by  (3.8).  a’ik  -  i )  -  b'lk)  <  ana 

since  (T'tf  )  isjnonincreasing.  e'{k)  <  E"(n. 

iii)-  Sin(;e  Xi..  k  =  i  -v  1.  t  -  2.  •  ■  ■  .  ;  -  ;  -  i.  are 

all  zero,  o'l^k)  =  <;-(  r).  for  all  t  <  k  <  ;  ^  j.  And  be- 

cause-A- is-a  n()hupdating  instant.  a'( A-  -  1)  -  e'iA)  = 
a'4n  -i-  €‘(A)  <  7'.  and  sp  (4.3)  follows. 

We  can  now  dcHve  sufficient  conditions  under  which 
4he  convex  polytope's  S,  and  £,  will  contain  6^. 

,  ‘  .Theorem  T:  The  convex  polytopes  S,  and  consequently 
The  ellipsoids  t  =  1.2.  ■  •  •  .  will  contain  the  true 
;parametef.  if 

i)  '£o  contains  0*.  (4.10a) 

ii)  the  true  moving  average  coefficients  satisfy 


<  0.5. 

iji)  the  threshold  7’  satisfies 


7*  2: 


, .  s  1,1 


470' 


1  -  2 


1^.1 


(4.10b) 


(4.10c) 


Proof;  Let  the  induction  hypothesis  be  6*  e 
Then  defining 

1/(0  =  (0(0 (4.11) 


and  recalling  the  definition  of  ,  yields 

l/(f  -  1)  <  a-(f  -  1)  (4.12) 


and  since  £"‘(1)  is  positive  definite  for  all  1,  a’ii  ~  I) 

>  0. 

Now  using  (3.1)  and  (3.5) 

(y(t)  -  0*^$(/))’ 

=  (C(i/~')[h‘(r)]  -  (C(r/'')  -  l)[€(f)l)' 


where  the  operator  C(<7“' )  has  been  defined  in  (1.2).  De¬ 
fining  nit)  =  C((/~')[u’(Ol.  and  recalling  an  elemen¬ 
tary  algebraic  inequality 

(a  ~  by  <  2n'  +  2b' 


yields 

(y(f)  -  $*^4>iiiy  <  2/r(f)  -r  2(c,€(f  -  1) 

-  oetr  -  2)  •  •  •  -  c7e(r  -  /•))'.  (4.13) 

But 


where 


irit)  <  (;7''.  for  all  t 


■■icnce. 

1  ,\m  -  '^•='1>(;i)'  <  ■  •  -  2!  c,  ed  -  !  1 

-  c:  411-2'  -  •  ■  •  -  t',,  e{i  -  /■)[)  . 

(4.15) 

But  by  Lemma  1.  if  /  -  is  the  updating  instant  imme¬ 
diately  preceding  time  instant  r.  then 

£(!-/):  <  '  e(  -  y  )  I  for  I  <1  <  r. 

Thus 

(y(r)  -  0'^^<^(i))' 

“  'f'"  “  !))■ 

Since  eHt  ~  j  )  =  y'  -  o'it  -  j  )  =  y'  -  a'it  -  1 ). 
Now  by  the  induction  hypothesis,  o'it  -  1 )  >  0.  Hence. 

(y(0  -  0*''-t(t))'  <  7’'  +  2^  E  |c,l  j  7'-  (4.16) 
So  the  convex  polytope  S,  will  contain  6*  if 

7'=.+  2(^SJc,1^7=  57'.  (4.17) 

The  inequality  (4.17)  will  hold  iff  (4. 10b)  and  (4.10c)  are 
true.  Assuming  (4.10b)  and  (4.10c)  thus  guarantees  that 
for  all  time  instants  t 

{y{t)  ~  <  y'.  (4.18) 

Using  (3.6)  and  (4. 1 1).  it  can  be  shown  that 

V{t)  -  o'it)  <  (1  -  X,)(I/(t  -  1)  -  o-tf  -  D) 

4-  X4(y(0  -  (t-^'K/))'  -  y'\  (4.19) 

and  so  from  (4.18)  it  follows  that 

Vit)  -  a=(/)  <  (1  -  X,)(K(f  -  1)  -  a-(r  -  D). 

(4.20) 

Finally,  by  (4.12).  it  follows  that 

!/(/)-  aT/)  <  0.  (4.21) 

i.e.,  E,  contains  0^.  and  a'it)  is  nonnegative  for  all  r. 
Remarks: 

I )  The  assumption  (4. 10b)  says  that  the  noise  sequence 
nit)  =  C((/*'  ilu'tn)  should  not  be  ' ‘too colored."  This 
condition  is  analogous  to  the  Strictly  Positive  Real  (SPR) 
condition  which  appears  in  the  ELS  algorithm  (cf.  Section 
I).  It  is  not  very'  difficult  to  show  that  for  the  SPR  condi¬ 
tion  to  hold,  it  is  necessary  that 


14.14) 


c  ■  <  1. 


(4.22) 


rlili  'RVSW  \  ^’!rv 
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U  can  also  be  seen  that  conaition  i4.  !ub)  u  a  stncier  :>irm 
of  the  Strictly  Dominant  Passive  (SDPf  condition  |!5j 
which  appe.  .s  in  the  analysis  of  some  signed  LMS  algo¬ 
rithms.  and  from  fl51.  it  follows  that  (4.i0b)  is  sutiicient 
for  the  SPR  condition  to  hold  and  hence  is  more  restrictive 
than  the  SPR  condition. 

2)  Selection  of  the  right  "noise  bound"  ■  is  made 
possible  by  (4.10c).  The  user  would,  however,  need  to 
have  some  knowledge  of  the  magnitude  of  the  true  mo\  - 
ing  average  coefficients.  Simulation  results  show  that 
overestimation  of  7’  has  very  little  effect  on  the  parameter 
estimates  (centers  of  the  bounding  ellipsoids),  although  it 
may  have  an  adverse  effect  on  the  size  of  the  bounding 
ellipsoids. 

3)  The  conditions  (4.10b)  and  (4.10c)  are  not  neces¬ 
sary  conditions,  and  the  algorithm  has  been  observed  to 
perform  well  in  several  examples  where  these  conditions 
were  violated. 

The  following  result  follows  straightforwardly  from 
Lemma  1  and  Theorem  1 . 

Corollary  I:  If  the  conditions  of  Theorem  1  hold  then 

a)  lim  e'frj  exists  ( 4.23a  | 

00 

where  { fy }  is  the  subsequence  of  updating  instants  of  the 
EOBE  algorithm,  and 

b)  uniformly  bounded  a  posteriori  prediction  errors 

e'(0  £  7\  for  all  time  instants  t.  {4.23b) 

Boundedness  of  5  V),  the  n  priori  prediction  error,  and 
convergence  of  the  parameter  estimates  to  a  neighborhood 
of  the  true  parameter  can  be  assured  by  requiring  the  re¬ 
gressor  vector  to  be  persistently  exciting.  The  next  lemma 
relates  the  positive  definiteness  of  P~Ui)  to  the  richness 
of  the  regressor  vector  'I*  ( r). 

Lemma  2:  If  there  exist  positive  a\  and  iV  such  that,  for 
all/ 

(-  V 

S  4>(i)‘I>^{i)  >  aJ  >  0.  (4. 24a) 

then  there  exists  a  positive  aj  such  that 


■>1  A.'>>mpioiK.iu'.  ^ounueo  r..r.imete!' cvumation  er¬ 
rors 


01 1) 


14.25. 


where  7.', and  are  as  m  (3.2)  and  (4.24b).  respecti\el> 
c)  If.  m  addition,  the  process  ( 1 , 1 )  is  stable,  then  the 
algorithm  vields  asvmptotically  bounded  a  priori  predic¬ 
tion  errors 


3-(/)  -  fO.  7'j. 

Proof: 

a)  From  (3.6b)  and  (3. 6f) 

li^i/)  -  0{t  -  \  )f 

^  \;¥ii)PHt  -  l)4>(/)o-\/) 
(1  -  \  T  X,C(r))' 


14.27 


14.281 


'  ’  (1  “  X,  4-  X,G(/)) 


where  (Vu{F(/  -  I  )}  is  the  maximum  eigenvalue  of 
P(t  -  1 ).  and  i|  ■  ,  denotes  the  Euclidean  norm.  Using 
(3.6d)  in  (4.5a)  yields 


a'(t)  =  (J'(l  - 


X,-6-(/)  C(/) 

(1  -X, +  X,G(/)r' 


(4.30) 


The  nonnegativity  of  a'(/)  therefore  implies 


2  ^75-(0  G(/) 

(1  -  Xi  4-  \,Gli)f 


(/■(O)  -  a‘{t)  <  CO. 

(4.31) 


Hence. 


lim  — 

-X  ( 1 


X-qM/)  Gil) 

-  X,  -  X,G(/))' 


i  4.321 


If  (4.24a)  holds,  then  by  Lemma  2.  e,„^^  {Pit  -  1 ).  the 
maximum  eigenvalue  of  Pit  -  1 ).  is  bounded  for  all  /. 
and  hence  (4.29)  and  (4.32)  yield 


p-'(t)  >  aj  >  0.  (4.24b) 


'()(/)  -  (^(/ -  1 1;!-*  0.  (4.33) 


Proof  of  the  lemma  is  the  same  as  that  of  Theorem  4  1  .ff' 
11 1).  it  is  thus  omitted  here. 

Remark:  The  positive  definiteness  of  P’'t/)  implies 
that  the  eigenvalues  of  Pit)  are  upper  bounded. 

Theorem  2:  If  the  assumptions  of  Theorem  1  are  sat¬ 
isfied  and  (4.24al  holds,  then  the  EOBE  algorithm  en¬ 
sures  the  following. 

a)  Parameter  difference  convergence 

lim  :6it)  -  hit  -  k)  '  =  0 

*  —  cc 

for  any  finite  k.  1 4.25 ) 


Applving  the  Minkowski  inequulitv  to  l()(n  "(/  - 

A)i!  and  using  (4.33)  completes  the  proof  of  (4.25). 

b)  Using  (3.5).  (4. 1 1 ).  and  (4.6).  an  expression  similar 
to  (4.19)  can  be  derived  as 


P(/l  =  ( I  -  X.)l''(/  -  I )  -  X.)  (C((/-  )(«■(/)] 
-  (C((/  )  -  1  )(el/)l)' 


1  -  X.  -  \.G(/)  - 

l-C 


(4.341 
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It  can  also  be  seen  that  condition  i4.  lubi  is  a  stricter  t-mn 
Of  the.  Strictly  Dominant  Passive  (SDP^  condition  |!5| 
which  appe.  .s  in  the  analysis  of  some  signed  LMS  algo¬ 
rithms.  and  from  [15].  it  follows  that  (4.10b)  is  surticient 
for  the  SPR  condition  to  hold  and  hence  is  more  restrictive 
than  the  SPR  condition. 

2)  Selection  of  the  right  “noise  bound"  ■  is  made 
possible  by  (4.10c).  The  user  would,  however,  need  to 
have  some  knowledge  of  the  magnitude  of  the  true  mo\  - 
ing  average  coefficients.  Simulation  results  show  that 
overestimation  of  7’  has  very  little  effect  on  the  parameter 
estimates  (centers  of  the  bounding  ellipsoids),  although  it 
may  have  an  adverse  effect  on  the  size  of  the  bounding 
ellipsoids. 

3)  The  conditions  (4.10b)  and  (4.10c)  are  not  neces¬ 
sary  conditions,  and  the  algorithm  has  been  observed  to 
perform  well  in  several  examples  where  these  conditions 
were  violated. 

The  following  result  follows  straightforwardly  from 
Lemma  1  and  Theorem  1 . 

Corollary  1:  If  the  conditions  of  Theorem  I  hold  then 

a)  lim  6‘(/j)  exists  t4.23al 

00 

where  {ij}  is  the  subsequence  of  updating  instants  of  the 
EOBE  algorithm,  and 

b)  uniformly  bounded  a  posteriori  prediction  errors 

e'(()  s  7^  for  all  time  instants  r.  {4.23b) 

Boundedness  of  6'(r),  the  a  priori  prediction  error,  ant) 
convergence  of  the  parameter  estimates  to  a  neighborhood 
of  the  true  parameter  can  be  assured  by  requiring  the  re¬ 
gressor  vector  to  be  persistently  e.xciting.  The  next  lemma 
relates  the  positive  definiteness  of  P~Ht)  to  the  richness 
of  the  regressor  vector 

Lemma  2:  If  there  exist  positive  ax  and  N  such  that,  for 
all  t 

t-S 

S  'I>(i ) ‘I>^{( )  >  axl  >  0.  (4.24a) 

I 

then  there  exists  a  positive  cvj  such  that 


n  A.'>Mnou'iii..i.i'-  'ounoeu  r..r.imete!' eMimation  c'- 


ii{t)  -  ■  —  m).  2-,r,(l  -  'J.i,  )'  i.]  i4.26t 

where  y.iand  oj  are  as  in  (3.2)  and  (4.24b).  respectiveh 

c)  If.  Ill  addition,  the  process  ( 1 . 1 )  is  stable,  then  the 
algorithm  vields  asvmptotically  bounded  a  prion  predic¬ 
tion  errors 


6-(n  -  [0. 

Proof: 

a)  From  (3.6b)  and  (3.6f) 

!j0(f)  -  Oil  -  l)f 

_  P'it  -  1 )  <I>(f)  o~(f) 

(1  -  X,  -r  X,G(/))' 

-  ,  XrG(()6'(0 


(4.27! 


(4.28) 


'  '  (I  -  X,  -f  X,G(f)) 

where  e„^^^{P{i  -  I  ) }  is  the  maximum  eigenvalue  of 
Pit  -  1 ).  and  li  • ,  denotes  the  Euclidean  norm.  Using 
(3.6d)  in  (4.5a)  yields 

i/.\  .  1,  X,'6'(r)G(()  -.n, 

a'(f)  =  a'{t  -  1) - 5.  (4.a0) 

(1  -X, +  X,G(0)' 

The  nonnegativity  of  a'lr)  therefore  implies 
^  X;6-(/)G(i) 

2  - ! — i-2 — L-L — ,  =  ff-(O)  -  a'(r)  <  00. 

'  =  '  (1  -  X,  +  X,G(/))' 


(4,31) 


Hence. 


X-o-(/)G(() 

Iim - 5  =  0.  t4..>j) 

--  1 1  -  X.  -  X,G(/|)' 

If  (4.24a)  holds,  then  by  Lemma  2.  {P(t  -  I ).  the 
maximum  eigenvalue  of  P(i  -  1 ).  is  bounded  for  all  r. 
and  hence  (4.29)  and  (4.32)  yield 


P''(r)  >  aW  >  0.  (4.24b) 


'O(t)  -  0(t  -  l):l  -*  0.  (4.33) 


Proof  of  the  lemma  is  the  same  as  that  of  Theorem  4  1  i>f 
(1 1).  it  is  thus  omitted  here. 

Remark:  The  positive  definiteness  of  P''tn  implies 
that  the  eigenvalues  of  Pit)  are  upper  bounded. 

Theorem  2:  If  the  assumptions  of  Theorem  1  are  .sat¬ 
isfied  and  (4.24al  holds,  then  the  EOBE  algorithm  en¬ 
sures  the  followina. 

V 

a)  Parameter  difterence  convergence 
lim  ;0(t)  -  0(r  -  A) '  =  0 

'  — cc 

for  any  finite  A.  ( 4.25 ) 


Apphing  the  Mmkow.^ki  inequalitv  to  ',0((i  "(r  - 

A)il  and  using  (4.33)  completes  the  proof  of  (4.25). 

b)  Using  t3.b).  (4.11).  and  (4.6).  an  e.xpression  similar 
to  (4.19)  can  be  derived  as 

K(t)  =  ( 1  -  .\.)K(r  -  1 )  -  A.)  (C(</'  )(ic(n] 

-  (C((/  )  -  l)(e(f)])' 

1  -  -  X.G(n  -  j 

- ; - - f(t)  . 

I  -  X,  J  (4.34) 
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Just  as  in  ihe  prooi  oi' Theorem  1 .  t-. 34)  can  oe  e.\pre''ca 
as 

V{t)  <  {  \  -  \.)Vlt  -  i )  -  x.*,'- 


r 


e'(t  -  j  \ 


1  -  X;  +  X.G(i)  ^  I 
- ; — ; - «■</) 


(4.35) 


where  7'’  is  as  in  (4. 14).  and  t  -  jh  the  updating  instant 
immediately  preceding  time  instant  t.  Assume  t  is  an  up¬ 
dating  instant.  Then  (4.10b).  (4.2),  and  the  nonnegativity 
of  G(f)  would  imply  that  the  term  in  square  brackets  on 
the  right-hand  side  of  (4.35)  is  not  positive,  and  so 


F(0  <  (1  -  X,)l^(f  -  1)  +  X,7’^  14.36) 


It  is  obvious  that  if  t  is  not  an  updating  instant,  then  (4  36) 
would  still  follow  from  (4.35).  A  nonrecursjve  fonn  for 
(4.36)  can  be  obtained  as 


(  / 


F(t)  <  n  (1 

(Ml 

-  X,.)F(0)  -r  7--  Z  q„ 

1  =  1 

(4.37) 

where 

(lit  = 

[Ml  -x,*,) 
lx, 

...  (1 

V  II 

I 

(4..38) 

For  large  /.  the  first  term  on  the  right-hand  side  of  t4.37) 
can  be  neglected.  In  Appendix  B.  it  is  shown  that 

I 

E  qi,  <  1.  (4.39) 

fa  I 

Hence,  for  large  enough  t 

v{t)  =  (0{t)  -  o*)V-'(/)(0{t)  -  s  7'- 

(4.40) 


And  so  (4.26)  follows  from  Lemma  2  and  (4.14). 

c)  Stability  of  the  process  1 1 . 1 )  and  the  boundedness  of 
u>(0  implies  that  the  outputs  y(r)  are  bounded.  Hence, 
from  (3.6e).  (4.23b),  and  Lemma  2.  it  follows  that 

G{t)  <  “  Ojla’  +  «  max  yHi)] 

I  -  /I  I  s  1  - 1 

<00  (4.41) 

where  11  is  the  order  of  the  AR  process  and  /  is  the  order 
ot  the  MA  process.  It  can  now  be  shown,  just  as  in  Theo¬ 
rem  3.2  ot  jl  Ij.  that  the  u  priori  prediction  errors  satisU 
(4.27). 

Remarks: 

1)  The  results  ot  Theorem  1.  and  the  results  i4.25». 
t4.26j  ot  Theorem  2.  do  not  require  the  process  to  tie  sta¬ 
ble.  However,  it  the  process  is  unstable,  then  on  attouni 
of  finite  precision  effects,  the  matnx  Pit)  may  not  stay 


ositive  ueimite.  ..uis  in.uuuating  me  notion  or  hounding 
eiiipsOKis  anu  causing  me  aigoruhm  to  fail.  In  this  situa¬ 
tion.  the  ELS  algorithm  wiil  fail.  too. 

2)  Theorems  1  anu  2  do  not  impose  any  statistical 
propenies  on  the  input  noise  sequence  {  wi  r ) } .  However, 
our  simulation  experience  has  been  that  the  parameter  es¬ 
timates  are  usually  not  close  to  the  true  parameters  if  the 
noise  IS  not  white.  Of  course,  such  is  also  the  case  for  the 
ELS  algorithm. 


V.  Si.MUL.ATioN  Results 

Simulations  have  been  performed  to  investigate  the  per¬ 
formance  of  the  EOBE  algorithm  vis  a  vis  the  ELS  algo¬ 
rithm.  In  this  paper,  we  present  simulation  results  for  two 
examples— a  broad-band  ARMA  (3.3)  process  and  a  nar¬ 
row-band  ARMA(  2.  2 )  process  where  the  indexes  n,  r  in 
an  ARMAOi.  r)  process  refer  to  the  orders  of  the  Afg"' ) 
and  C(r/"' )  polynomials,  respectively. 

Example  l-Broad-bancl  ARMA  (}.  3)  Process:  The 
output  data  {  y(  f ) }  are  generated  by  the  following  differ¬ 
ence  equation; 

_v(f)  =  -0.4y(f  -  1)  T  0.2y(f  -  2)  r  0.6y(/  -  3) 

4-  u-(f)  -  0.22u-((  -  1)  +  0.17u-(f  -  2) 

-  0.1w(f  -  3). 

The  noise  .sequence  {w(t)}  is  generated  by  a  pseudo¬ 
random  number  generator  with  a  uniform  probability  dis¬ 
tribution  in  (  -1.0.  1.0].  The  upper  bound  7’  was  set 
equal  to  25.  The  parameter  estimates  were  obtained  by 
applying  the  EOBE  algorithm  to  1000  point  data  se¬ 
quences.  Twenty-five  runs  of  the  algorithm  were  per¬ 
formed  on  the  same  model  but  with  different  input  noise 
sequences.  The  average  squared  parameter  error  Z,i(t)  is 
computed  for  the  AR  coefficients  according  to  the  formula 

^5 

Ui)  =  ^  ^  /(f) 

2o 

where  the  squared  AR  parameter  error  at  time  t  for 
the  yth  run.  is  defined  by 

l,[i)  =  S  ((i,,(f)  -  (I,)' 

.=  I 

with  a,  and  a,(i)  being  defined  by  (1.1)  and  (3.3).  re¬ 
spectively.  The  average  squared  parameter  error  Liff  1  for 
ihe  .MA  Coefficients  is  defined  analogously  Figs  2  and  3 
display  the  average  squared  estimation  errors  for  XR  and 
.MA  parameters  using  both  the  EOBE  and  the  ELS  algo¬ 
rithms.  The  cun-es  show  that  the  performance  of  the  two 
algonthms  is  comparable.  The  average  number  of  updates 
for  the  EOBE  algorithm  was  160  for  1000  point  data  se¬ 
quences.  Thus,  only  16T  of  the  samples  are  used  for  up¬ 
dates.  as  compared  to  the  ELS  algorithm  which  updates 
at  every  sampling  instant. 
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Fig.  2.  Average  squared  AR  parameter  estimation  error  for  the  EOBE  and 
ELS  algorithms— E.xample  I . 


Fig.  3.  Average  squared  MA  parameter  estimation  error  for  the  EOBE  and 
ELS  algorithms— E.xample  I . 


TABLE  I 


Upper 
Bound  y" 

.Average 
Tap  Error 

Average  Number 
of  Updates 

Total  Number 
of  Times  0* 

IS  Out  ofS, 

Total  Number 
of  Times  6* 

IS  Out  of  E, 

Average 

Final 

Volume 

Average 
Final  Sum 
of  Axes 

0.5 

0.031 

160 

7309 

23952 

1.0 

0.031 

160 

315 

0 

0.22 

10.46 

2.0 

0.031 

160 

0 

0 

2.6  X  10‘ 

'4 

5.0 

0.031 

154 

u 

0 

5.4  X  10’ 

265 

25.0 

0.031 

153 

0 

0 

2.1  X  10'- 

15.^7 

100.0 

0.0308 

156 

0 

0 

1.0  X  10"“ 

6303 

The  effect  of  different  choices  for  the  upper  bound  ■) ' 
on  the  performance  has  also  been  studied.  For  each  value 
of  7’.  the  asymptotic  average  squared  parameter  error  T 
was  computed  over  25  runs  of  the  algorithm,  according 
to  the  formula 

25 

r  =  ^  E  ||0,(iooo)  -  0*j|' 

where  6f(  1000)  is  the  parameter  estimate  at  the  1 000th 
iteration  in  the yth  run.  The  lower  bound  on  ,  ’  as  caku- 
lated  from  t4.10c)  is  7’  >  8.54.  The  second  column  ot 
Table  I  lists  the  different  values  ot  T  obtained  when  7  ’  is 
varied  from  0.5  to  100.  It  is  dear  that  the  centers  of  the 
bounding  ellipsoids  are  insensitive  to  the  value  of  7’. 
since  me  tap  error  is  almost  constant.  However,  the  final 
size  of  the  ellipsoids  does  depend  on  7  .  The  negative 


volume  obtained  when  7'  =  0  5  is  an  indication  of  the 
fact  that  odf)  is  no  longer  positive  and  so  hounding  el 
lipsoids  cannot  be  constructed. 

The  performance  of  the  algorithm,  when  the  noise  se 
quence  {u’(f)}  has  a  Gaussian  distribution,  was  evalu¬ 
ated  in  a  similar  fashion.  A  constant  value  of  7*  =  25 
was  used  and  the  standard  deviation  of  the  noise  was  var¬ 
ied.  The  results  for  25  runs  of  the  algorithm  are  shown  in 
Table  II.  It  is  clear  that  the  unbounded  noise  has  marginal 
effect  on  the  parameter  estimates 

Finall) .  the  tracking  capabilitv  of  the  EOBE  algorithm 
was  compared  to  that  of  the  ELS  algorithm  ( with  forget 
ting  factor  =  0.99 )  The  same  model  was  used  to  generate 
400  data  points.  The  parameters  were  then  changed  bv 
150^  and  the  next  400  points  were  generated  Finally  1 
the  last  200  points  were  generated  bv  using  the  original  | 
parameters.  The  average  -.quared  parameter  error  was 
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ELS  algorithms— Example  2. 

evaluated  over  25  runs  and  is  shown  in  Fig  4  Even 
though  the  formulation  of  bounding  ellipsoids  is  based  on 
the,.assumption  that  the  parameters  are  constant,  the  sim¬ 
ulation  results  show  that  the  algorithm  is  able  to  accom¬ 
modate  changes  in  model  parameters.  .Analysis  of  the 
tracking  ability  of  the  algorithm  is  currently  under  inves¬ 
tigation. 

Example  2—Narrow-baiul  ARMA  (2.  2}  Process:  The 
output  data  {  y(/) }  are  generated  by  the  following  differ¬ 
ence  equation: 

yu)  =  1.4y(r  -  1)  -  0.95\tf  -  2)  -  >iu) 

-  0.86u-(r  -  1)  -  0.431u-(f  -  2), 

Note  that,  in  this  case,  condition  i4.10b)  of  Theorem  I  is 
violated.  The  noise  sequence  is  uniformly  distributed  in 
( — 1.0.  l.Ol.  as  in  the  first  e.xample.  The  upper  bound 
was  set  equal  to  25.  The  axerage  squared  .AR  and  MA 
parameter  estimation  errors  are  calculated  oxer  twenty  fixe 


runs  and  plotted  in  Figs.  5  and  6.  respectively.  The  aver¬ 
age  number  of  updates  xvas  78  for  1000  point  data  se¬ 
quences. 

For  this  e.xample  too.  different  values  of  the  ujjper 
bound  7'  were  used  and  no  significant  difference  in  the 
quality  of  estimates,  number  of  updates  or  convergence 
rate  was  observed.  Thus,  it.  is  verified  once. again  that  a 
precise  knoxvledge  of  the  upper  bound  is  not  a  prerequisite 
for  satisfactory  performance  of  the  algorithm. 

VI.  C0NCLU.S10S 

.A  recursive  parameter  estimation  algorithm  has  been 
extended  tor  ARM  A  parameter  estimation.  The  main  fea¬ 
tures  of  the  algorithm  are  a  membership  set  theoretic  for¬ 
mulation  and  a  discerning  update  strategy.  Convergence 
analysis  of  the  algonthm  has  been  performed  under  the 
assumption  that  the  noise  is  bounded.  The  main  results 
of  the  anal:  sis  are  that  all  the  bounding  ellipsoids  xvill 
contain  the  true  parameter,  provided  the  true  moving 
average  coefficients  satisfy  a  condition,  xxhich  is  analo¬ 
gous  to  the  SPR  condition  of  the  ELS  algorithm.  In  ad¬ 
dition.  the  algorithm  yields  uniformly  bounded  a  poster¬ 
iori  prediction  errors.  With  a  persistence  of  c.xcitation 
condition  on  the  regressor  xector.  boundedness  of  the  <1 
priori  prediction  errors  can  then  be  established  and  the 
parameter  estimates  are  shoxvn  to  conxerge  to  a  neighbor¬ 
hood  of  the  true  parameters.  Simulation  results  shoxv  that 
the  performance  of  the  algonthm  is  comparable  to  the  ELS 
algorithm  xxnile  requiring  far  fexxer  updates 

.Appe.sdi.x  a 

Proof  ot  iJ.S)  iind  i3.9):  The  proof  is  along  the  lines 
of  the  proof  of  Lemma  2  1  in  1 1 1  j 
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Since  X."  minimizes  a‘(  i  ) 

\f)  £  a'lf.  0)  =  o'(s  —  1 1  <  A.N- 


and 


da'i,r)' 

(IX 


'{■  -  o'u  -  n 

(1  -  X,  ^  \.G{r))' 


and  ’ 

<iV-{t)  2dHt)G{t} 

i\  ■>  ""  ;  ‘  ( A. 3 ) 

A'  (1  -  Xi  +  X,G(f))' 

Thus.  d'a\t)/d'K}  >  0.  unless  bHi)  =  0  or  G(f)  =  0. 
Since  P(f  -  1 )  is  positive  definite.  G,(r)  =  0  iff  ^(t)  = 
0.  The  algorithm  can  be  modified  to  detect  the  occurrence 
of  a  null  ^(f)  and  set  it  to  a  small  nonzero  value,  prior 
to  the  calculation  of  G(/).  Thus,  it  can  be  assumed  that 
G{l)  st  O  for  all  t.  If  5’{/)  =  0  (a'(r  -  1)  +  o'.(f)  < 
in  this  case),  then,  since  a'(0)  <  y'  by  (3.7).  and 
since  a'ii)  is  nonincreasing,  therefore,  by  (A. 2) 
da'(t)/d'K,  is  positive,  and  hence  a'(/)  is  minimized  if 
X*  =  0.  Now,  for  the  sequel,  the  second  derivative  of 
a'(t)  can  be  assumed  to  be  positive,  and  hence  the  unique 
minimum  occurs  at  da‘it)/d\,  =  0.  From  (A.2).  if  G(t) 
-  I,  a‘(f)  is  minimized  if 


=  -/3(0)/2.  (A.4) 

Otherwise,  if  G(/)  it  I,  o^{i’)  is  minimized  if 

X. _ L_ri  - 1  ^  o(') 

'  l-  G(()L  ^1  +0(f){G(()  -  !).■ 

IA.5) 

.Moreover,  in  (A.4)  and  (A. 5) 

Xr  >  0  «  ^3(r)  <  1  o  a-{i  -  1 )  t  o^‘^)  >  y. 

(A.6) 

It  is  easy  to  show  that  I  +  (3(0  (C(f)  -  1 )  is  always 
positive.  Since  ff'(O)  <  y'  and  ff’(r)  is  nonincreasing, 
therefore.  /3(r)  >  0.  From  {A.6).  ^(i)  <  I,  hence  1  - 
t//3(j)  <  0.  Then 

1  +  |3(r)(G(0  -  1)  <  0  =  G(f) 

<  1  -  l//3(f)  =»  G(f)  <  0 

which  is  a  contradiction.  Thus.  (A. 5)  would  always  yield 
real  Xf .  Iris  now  shown  that  •.A.4)  and  (A. 5)  yield  values 
of  Xf  which  are  upper  bos-nded  by  unity.  If  G{0  =  1, 
men  stnye  (3 ( O  >  ().  (A.4)  yields  Xr  <  l.IfGtrj  <  1. 
then  X/  a  1  «  1  -  [G(f)/(  I  -r  j3(r)(  G(o  -  I ))]'  * 
>  I  -  G(o 

oG(r)(l  -r  )3(f)(G(f)  -  D)  >  I.  IA.7) 

But  G(0  <  1  and  (3(0  >  0  contradict  (A.7).  Hence,  if 
Git)  <  1.  i«cn  K  <  1-  It  can  be  shown  in  e.xactly  the 


v'X''  ■  -U  H  .  '  ■' >  M \RCii 

-ame  way  mat  G( :  -  '  mhuu  impiy  tnat  a.' <  i.Thus. 
.inlike  the  case  in  1 1  i  i.  no  upper  bouno  has  to  be  imposeo 
on  the  forgetting  factor. 
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Proof  oj  i4.39i  >hy  Induction):  Let 


Then 


and 


Assume 


Rlt)  =  I  (/,. 


R(l)  =  (  1  -  \,)R(l  -  1)  -b  X, 


R(\)  =  X,  <  1. 


Then  by  (B.2) 


i.e.. 


R(t  -  1)  <  1. 

Rit]  <  (1  -  X.)  ■  I  ^  X,. 
R(t)  <  1. 


(B.l  . 

(B.2) 
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RECENT  DEVELOPMENTS  IN  OPTIMAL  BOUNDING  ELLIPSOIDAL 
PARAMETER  ESTIMATION 

Ashok  K,  RAO 

COMSAT  UBS,  Clarksburg,  MD  20874,  USA 
Yih-Fang  HUANG 

Dtparimertt  of  Electrical  &  Computer  Engineering,  University  of  Notre  Dame,  Notre  Dame,  IN  46S56,  USA 


The  Optimal  Bounding  Ellipsoid  (OBE)  algorithms  are  viable  alternatives  to  conventional  adaptive  filtering 
algorithms  in  situations  where  the  noise  does  not  satisfy  the  usual  stationarity  and  whiteness  assumptions.  An 
example  is  shown  in  which  the  performance  of  an  OBE  algorithm  is  seen  to  be  markedly  superior  to  that  of  the 
recursive  ieast>squares  algorithm.  Subsequently,  an  overview  of  some  recent  work  in  the  area  of  OBE  parameter 
estimation  is  presented.  A  lattice  filter  implementation  of  one  particular  OBE  algorithm  is  first  described.  The 
extension  of  the  OBE  algorithm  to  the  estimation  of  parameters  of  ARMA  models  is  performed  and  the  results 
of  a  convergence  analysis  are  presented.  It  is  demonstrated  through  a  simulation  example  that  the  transient 
performance  of  the  proposed  algorithm  is  superior  to  that  of  the  welhknown  extended  least-squares  algorithm. 


1.  Introduction 

In  recent  years,  there  has  been  a  resurgence  of  interest  in  an  alternative  approach  to  parameter 
estimation,  which  has  been  termed  membership  set  parameter  estimation  by  some  authors  [1,2]. 
This  approach  is  particularly  appropriate  when  the  probability  distribution  of  the  disturbances  is 
unknown,  and  a  bound  on  the  magnitude  of  the  disturbances  is  available  [2,3].  In  contrast  to 
conventional  system  identification  schemes  (e.g.  maximum  likelihood,  least  squares  etc.  [4]) 
which  yield  point  estimates  of  the  parameters,  a  membership  set  algorithm  yields  a  set  of 
parameter  estimates  which  are  compatible  with  the  model,  data,  and  noise  bounds.  This  set  of 
parameters,  which  is  usually  a  convex  polytope  in  the  parameter  space,  may  become  extremely 
complicated  to  formulate  and  so  it  may  be  necessary  to  approximate  the  set. 

In  this  paper,  the  discussions  will  be  concentrated  on  the  ellipsoidal  outer  bounding  approach 
which  approximates  the  exact  membership  set  at  each  instant  by  an  ellipsoid  in  the  parameter 
space.  'IThe  algorithms  in  this  class  [2,5-7]  are  temporally  recursive  and  yield  ellipsoids  which  are 
optimal,  in  a  sense  to  be  defined  later.  The  computational  complexity  of  the  Optimal  Bounding 
Ellipsoid  (OBE)  algorithms  is  much  lower  than  that  of  the  exact  polytope  bounding  algorithms 
[8]  and  non-recursive  linear  programming  based  algorithms  [9].  The  ellipsoidal  formulation  also 
helps  to  make  the  analysis  tractable.  Furthermore,  a  discerning  update  strategy,  which  proves  to 
be  appealing  for  recursive  algorithms,  evolves  quite  naturally  in  the  optimization  of  the 
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ellipsoids.  A  disadvantage  of  the  OBE  algorithms  is  the  possible  looseness  of  the  ellipsoidal  outer 
bounds. 

The  objective  of  this  paper  is  to  provide  an  overview  of  some  recent  developments  and 
applications  of  the  OBE  algorithms.  It  begins  by  providing  a  brief  review  of  the  various  OBE 
algorithms.  The  superiority  of  the  algorithms  ^is  a  vis  commonly  used  algorithms  like  the 
Recursive  Least-Squares  (RLS)  algorithm  in  situations  where  the  noise  does  not  satisfy  the 
conventional  statiqnarity  and  whiteness  assumptions  will  be  demonstrated  by  means  of  an 
example.  In  ^tion  3,  an  approximate  lattice  implementation  of  one  of  the  OBE  algorithms  will 
be  de^bed  [10].  An  extension  of  the  OBE  algorithm  to  the  estimation  of  parameters  of  ARMA 
models  will  be  presented  in  Section  4.  A  simulation  example  will  be  presented  to  compare  the 
transient  performance  of  the  extended  algorithm  to  that  of  the  well-known  Extended  Least- 
Squares  (ELS)  algorithm. 


2.  The  OBE  algorithms 

The  OBE  algorithms  estimate  the  coefficients  of  autoregressive  with  exogeneous  input  (ARX) 
processes  described  by  [11] 

y{t)  =  a^y{t  -  1)  +  •  •  •  +a„yit  -n)  + 

+  b^u{t-l)+ •¥b„u{t  —  m)  +  v{t),  (1) 

where  t  is  the  integer  sample  number  and  y{t),  u{t)  and  v{t)  denote  the  output,  input  and  the 
noise  term,  respectively.  This  equation  can  be  recast  as 

y{t)  =  e*'^tt>{t)  +  v{t),  (2) 

where 

d  —  [u],  02,...,  (i„,  b^,  6„] 

is  the  vector  of  true  parameters,  and 

<t>{t)  =  [y{t-l),  y{t-2),...,y{t-n),  u{t),  u{t-l),...,u{t-m)]^ 


is  the  regressor  vector.  It  is  assumed  that  the  noise  is  uniformly  bounded  in  magnitude,  i.e.,  there 
exists  a  known  Jq  >  0,  such  that  for  all  t. 


(3) 

Combining  (2)  and  (3)  yields 

{y{t)-0*Mt)f  ^yo- 

(4) 

Let  S,  be  a  subset  of  the  Euclidean  space  defined  by 

S,  =  {9:  {y{t)-eMt)f^yi, 

(5) 

The  OBE  algorithms  start  off  with  a  large  ellipsoid,  Eq,  in  which  contains  all  admissible 
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values  of  the  model  parameter  vector  0*.  After  the  first  observation  ;  (1)  is  acquired,  an  ellipsoid 
is  found  which  bounds  the  intersection  of  £q  and  the  convex  polytope  5,.  To  hasten  conver¬ 
gence.  this  ellipsoid  must  be  optimized  in  some  sense,  say  minimum  volume,  minimum  trace 
[2,7];  or  by  any  other  criterion  [6].  Denoting  the  optimal  ellipsoid  by  £,.  one  can  proceed  exactly 
as  before  with  future  observations  and  obtain  a  sequence  of  optimal  bounding  ellipsoids  {£,}. 
The.  center  of  the  ellipsoid  £,  can  be  taken  as  the  parameter  estimate  at  the  rth  instant  and  is 
denoted  by  d{t).  If  at  a  particular  time  instant  /.  the  resulting  optimal  bounding  ellipsoid  would 
be  of  a  “smaller  size”,  thereby  implying  that  the  data  point  y{i)  contains  some  fresh  “informa¬ 
tion”  regarding  the  parameter  estimates,  then  the  parameter  estimates  are  updated.  Otherwise  £; 
is  set  equal  to  £,_,,  and  the  estimates  are  not  updated.  In  essence,  the  recursive  estimator 
consists  of  two  modules,  an  information  evaluator  followed  by  an  updating  processor.  At  each 
data  point,  the  received  data  proceed  to  the  updating  processor  only  if  the  information  evaluator 
indicates  that  some  fresh  information  is  contained  in  the  data.  For  details  of  the  minimum 
volume  OBE  algorithm,  one  may  refer  to,  e.g.,  [2,7,12]. 

The  subsequent  discussions  will  be  focused  on  a  particular  OBE  algorithm  [6].  The  optimiza¬ 
tion  criterion  for  the  OBE  algorithm  of  [6]  is  defined  in  terms  of  a  certain  upper  bound  on  the 
estimation  error.  Such  a  criterion  yields  several  advantages  over  not  only  the  minimum  volume 
and  minimum  trace  OBE  algorithms,  but  also  other  membership  set  algorithms  mentioned  in 
[12].  The  updating  criterion  is  simpler,  and  the  presence  of  an  information  dependent 
updating/forgetting  factor  enables  the  algorithm  to  track  slow  time  variations  in  the  parameters. 
Analysis  of  the  algorithm  shows  that  if  the  input  is  sufficiently  rich,  as  defined  in  [6],  and  the 
noise  is  uncorrelated  with  the  inputs  then  the  prediction  error  is  asymptotically  bounded  by  the 
noise  bound  and  the  parameter  estimation  error  is  bounded  by  a  quantity  proportional  to  the 
noise  bound.  In  addition,  asymptotic  cessation  of  updating  is  guaranteed  in  the- fixed  parameter 
case.  These  properties  ate  not  apparent  in  the  other  membership  set  algorithms. 

For  the  OBE  algorism  of  [6],  the  bounding  ellipsoid  at  the  /th  instant  is  formulated  as 

£,=  {e-dit))^p-\t){e-d{{))^c!^t)]  (6) 

for  some  positive  definite  matrix  P(t)  and  a  non-negative  scalar  o’(r).  The  size  of  the  bounding 
ellipsoid  is  related  to  the  scalar  a~{t)  and  the  eigenvalues  of  P{t).  The  update  equations  for 
d(t),  P{t)  and  a~{t),  derived  in  [6],  are  as  follows: 


e{t)  =  e{t-i)  +  K{t)8{t), 

(7a) 

(7b) 

1  + 

1 

II 

(7c) 

G(0  =  <^>■"(0^{^-l)<^>(0. 

(7d) 

P{t)  =  Y^[l-K{t)<b^{t)]P{t-\), 

(7e) 

a  (0-(l+X,)<.  (<  l)  +  X,Vo  i_x,  +  X,C(0' 

(7f) 

The  optimal  ellipsoid  £,  which  bounds  the  intersection  of  £,_,  and  S,  is  defined  in  terms  of 
an  optimal  value  of  the  updating  gain  factor  A,,  where  0  <  A,  <  a  <  1.  vvith  a  being  a  user  chosen 
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upper  bound  oh  the  updating  gain  factor.  The  optimum  value  of  is  determined  by  minimiza¬ 
tion  of  d'(r)  with  respect  to  X,  at  every  time  instant.  The  minimization  procedure  results  in  a 
discerning  update  procedure.  In  particular,  X,  is  set  equal  to  zero  (no  update)  if 

a^(r-l),  +  8^(r):$Yo^.  (8) 


On  the  other  hand,  if  (8)  is  not  satisfied,  then  the  optimal  values  of  X,  is  computed  as  follows: 
X,  =  min(a,  v), 

with 


and 


[l-^(0l/2 
1 


1-G(0 


1- 


G{t) 


l+/?(/)(G(/)-l) 


1/2' 


if  S\t)  =  0, 

if  <5(0  =  1, 

ifl  +  )8(0((5(0-l)>0, 
ifl  +  /8(/)(G(0-l)<0 


The  recursions  (7),  and  the  selective  update  strategy,  along  with  the  initial  values 
P“^(0)  =  /,  0(0)  =  0  and  a^(0)~  l/A  with  id  <*:  1 


(9) 


(10) 

(11) 


form  the  OBE  estimation  algorithm,  The  value  chosen  for  the  upper  bound  Yq  need  not  be  a  tight 
bound  on  the  noise  magnitude  since  the  parameter  estimates  are  not  affected  by  an  overestima¬ 
tion  of  the  noise  bound  [13,  p.  52].  Overestimation  of  the  noise  bound  however,  will  cause  the 
bounding  ellipsoids  to  be  larger.  Underestimating  the  noise  bound  may  cause  a^{t)  to  become 
negative  at  some  instant,  thereby  causing  the  bounding  ellipsoid  to  vanish.  In  this  case,  a 
recovery  procedure  may  be  activated  to  either  increase  the  size  of  the  ellipsoid  £,_i  or  increase 
the  width  of  S,  by  increasing  Yo* 

A  striking  feature  of  the  OBE  algorithms  is  their  similarity  to  the  Recursive  Weighted  Least 
Squares  (RWLS)  with  forgetting  factor  algorithm.  In  fact  the  OBE  algorithm  of  [6]  can  be 
considered  a  special  case  of  the  RWLS  with  forgetting  factor  algorithm  with  a  weighting  factor 
X,  and  a  forgetting  factor  1  -  X,.  However,  the  intelligent  selection  of  the  weighting  factor  X, 
makes  the  actual  behavior  of  the  OBE  algorithms  quite  different  from  that  of  the  RWLS. 

The  RLS  algorithms  have  become  increasingly  popular  in  the  fields  of  adaptive  signal 
processing  and  adaptive  control.  It  is  therefore  worthwhile  to  investigate  situations  in  which  the 
use  of  the  OBE  algorithms  would  be  preferred  to  the  RLS  algorithms.  For  example,  those  cases 
in  which  the  statistical  nature  of  the  noise  is  unknown  or  in  which  the  noise  does  not  satisfy  the 
usual  stationarity  and  whiteness  assumptions  seem  particularly  appropriate  for  the  OBE  al¬ 
gorithms.  Milanese  and  Belforte  [9]  have  demonstrated  the  superiority  of  the  Minimum  Uncer¬ 
tainty  Interval  Correct  Estimator  (MUICE)  over  the  least-squares  estimate  for  a  third-order 
moving  average  model  where  the  noise  is  proportional  to  the  magnitude  of  the  output.  A 
comparison  between  the  OMNE  algorithm  and  least-squares  for  a  non-linear  biological  model 
has  been  presented  in  [14].  However  the  OMNE  and  MUICE  are  non-recursive  and  more 
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computationally  intensive  than  least-squares  algorithms  and  it  is  perhaps  fairer  to  compare  the 
RLS  algorittos  with  the  OBE  algorithms  which  have  similar  computational  complexity.  For  the 
sake  of  illustration,  we  present  an  example  below,  in  which  the  noise  is  quasi-stationary  [4],  and 
compare  the  performance  of  the  OBE  algorithm  of  [6]  to  the  standard  unweighted  RLS 
algorithm. 

Example  1.  The  following  ARX  (2.2)  model  is  considered 

y{t)  -  -0.4  >»(/  -  1)  -  0.85  y{t  -  2)  -  0.2  u{t)  -  0.7  u{t  -l)  +  u{t), 

where  the  measureable  input  u{t)  is  white  and  uniformly  distributed  in  [-1, 1]  and  v{t)  is  a 
sinusoid  in  white  noise  w(/).  Such  a  situation  could  arise  when  the  observations  are  affected  by 
power  supply  hum  or  other  electromagnetic  interference.  The  following  model  for  v{t)  is 
assumed: 

v{t)  =  (1  -  fi)w{t)  +  /8  sin(‘rtr/10). 

The  white  noise  sequence  w(r)  is  also  uniformly  distributed  in  [-1, 1]  and  is  uncorrelated  with 
the  input  sequence.  The  value  of  p  is  varied  from  0  to  1  and  for  each  value  of  )3,  ten  Monte  Carlo 
runs  of  the  OBE  and  RLS  algorithms  are  performed  with  data  records  of  500  points  each.  The 
value  of  the  upper  bound  on  the  updating  gain  factor  is  a  0.5  and  the  upper  bound  on  the 
noise  is  Yq  -  l-O-  Th®  filial  parameter  estimation  error  (^*  -  d(500))^(^ *  -  ^(500))  is  averaged 
over  the  ten  runs  and  is  displayed  in  Fig.  1  for  p  ranging  from  zero  to  one.  Notice  that  the 
parameter  estimates  of  the  RLS  algorithm  are  unacceptable  for  larger  values  of  p.  In  contrast, 
the  performance  of  the  OBE  algorithm  is  relatively  constant  over  the  range  of  p.  The  perfor¬ 
mance  of  the  OBF  'gorithm  has  also  been  observed  to  be  superior  to  the  RLS  algorithm  for 
other  cases  in  whicu  .he  noise  is  impulsive  and  bursty  [13]. 

In  conclusion,  it  can  be  noted  that  the  similarity  of  the  OBE  algorithms  to  the  RLS  algorithms 
facilitates  the  analysis  of  the  algorithms  and  eases  the  development  of  numerically  superior  and 
faster  implementations  of  the  OBE  algorithms.  Analysis  of  finite  precision  effects  in  the  OBE 
algorithm  of  [6]  has  been  performed  in  [13,15]  and  upper  bounds  on  the  parameter  estimation 
error  due  to  finite  word-length  computations  have  been  derived.  It  has  also  been  shown  that  the 
time  recursion  for  the  matrix  P{t)  in  the  OBE  algorithm  of  [6]  is  less  susceptible  to  round-off 


Fig.  1.  Mean-squared  parameter  estimation  errors  of  the  OBE  and  RLS  algorithms  in  white  noise  mixed  with 

sinusoidal  noise  (Example  1). 
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errors  than  the  corresponding  recursion  in  the  RLS  algorithm.  As  in  the  RLS  case,  Bierman’s 
UD.U^  factorization  can  be  performed  straightforwardly,  to  update  the  P  matrix  in  the  OBE 
algorithm,  in  a  numerically  stable  fashion.  Systolic  array  implementations  of  the  algorithm  have 
been  reported  in  [16,17].  Titus  there  exists  the  potential  to  apply  well  established  techniques  from 
the  adaptive  filtering  and  system  identification  literature  to  bounding  ellipsoid  algorithms. 

3.  Lattice  implementation 

Lattice-filter  implementations  [18]  of  adaptive  algorithms  have  become  popular  for  a  number 
of  reasons.  Among  others,  the  more  prominent  ones  are:  (1)  the  modular  structure  of  lattice 
filters  which  renders  them  particularly  suitable  for  VLSI  implementations;  (2)  the  low  sensitivity 
of  the  filter  to  numerical  perturbations  in  the  lattice  coefficients;  and  (3)  the  fact  that  the  lattice 
coefficients  are  independent  of  the  filter  order,  thus  making  it  possible  to  add  successive  lattice 
stages  or  subtract  existing  ones  without  recalculating  the  already  existing  coefficients.  In  this 
section,  an  outline  of  the  lattice-filter  formulation  of  the  OBE  algorithm  of  [6]  is  presented.  This 
lattice-filter  implementation  appears  to  retain  all  the  above  mentioned  advantages.  Details  of  the 
implementation  and  simulation  results  have  been  presented  in  [10]  and  [19]. 

Consider  the  following  well-known  RLS  lattice  recursions  [18]  for  an  AR  model 

=  form  =  1,2 . iV,  (12) 

form  =  l,2 . Af.  (13) 

where,  eo(0  = 'b(0  =>'(0.  fim(0  is  the  forward  prediction  error  of  order  m,  r„(/)  is  the 
backward  prediction  error  of  order  m  and  k)„  and  are,  respectively,  the  mth  backward  and 
forward  partial  correlation  (PARCOR)  coefficients.  Iterating  (12)  up  to  order  m~N  yields 

=>*(0  -  k^it  -  l)ro(/  - 1)  -  k^{t  -  l)ri(r  -  1) 

—  •  •  •  — —  l)r^_}(/ —  1).  (14) 

Thus  the  iVth  order  predictor  of  >>(/)  is 

y{t/N)  =  k^,{t  -  l)ro(r  - 1)  +  A:?(t  -  l)r,(t  -  1)  +  •  •  •  +A:^_,(t  -  l)r;v-i(^ "  1). 

(15) 

Compare  (15)  to  the  equation  for  a  transversal  predictor  given  by 

;'(//iV)=ai(/-l)y(f-l)  +  02(^-l)>'(^-2)+  +aw(r -!)>-(/- V) 

and  let  y,  =  (0 . ;'(0),  >’(!),..., XO)  and  R„,  =  {0,...,r„{Q),  r^(l) . r„{i)),  the  optimal 

predictor  for  the  RLS  transversal  case  can  be  thought  of  in  geometrical  terms  as  the  projection  of 
Y,  onto  the  regressor  space  spanned  by  y,_i,  The  backward  error  vectors 

Ro,t-i.  span  exactly  the  same  space  with  the  difference  that  they  are 

mutually  orthogonal,  and  the  optimal  predictor  for  the  RLS  lattice  case  is  again  the  projection  of 
Y,  onto  the  above  regressor  space.  Thus  the  predictors  for  the  RLS  lattice  and  transversal  case 
are  identical. 

In  general,  the  OBE  estimates  at  every  time  step  are  not  identical  to  the  RLS  estimates. 
Nevertheless,  the  approximate  minimum  mean  square  residual  property  of  the  ellipsoidal  center 
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[20]  justifies  the  imposition  of  the  lattice  structure  (12)  and  (13)  on  the  for\v’ard  and  backward 
errors  of  different  orders  for  the  OBE  algorithm.  Therefore,  in  principle,  the  OBE  algorithm  can 
be  used  to  calculate  the  transversal  filter  coefficients  and  then  a  step  down  procedure  described 
in  [18.  Section  4.2]  can  be  used  to  calculate  the  optimal  PARCOR  coefficients.  However,  this 
method  will  involve  an  excessive  amount  of  computation.  Furthermore,  many  of  the  advanta¬ 
geous  features  of  the  lattice  structure  mentioned  earlier  will  be  lost  since  the  transversal  filter 
coefficients  are  being  used  to  obtain  the  lattice  coefficients.  It  is  thus  preferable  to  apply  the 
OBE  in  such  a  way  that,  at  every  time  step,  the  estimate  of  the  PARCOR  coefficients  is  obtained 
directly.  This  problem  can  be  tackled  by  working  in  the  space  of  PARCOR  coefficients  instead 
of  the  space  of  the  transversal  filter  coefficients.  More  specifically,  define  the  ellipsoid 

£o=  (0':  0'Vo-'0<l/i\) 

and  the  convex  poly  tope 

where 

^  “  (^0?  “  ('b(^  ”  1)' "■  1))  * 

?o  =  /  and  d«l. 

If  the  true  PARCOR  coefficients  are  defined  to  be  the  ones  obtained  by  applying  the  step  down 
procedure  to  the  true  AR  parameters  of  the  system,  and  if  the  true  PARCOR  coefficients  have 
been  used  to  recursively  obtain  <fi'(r)  from  (12)  and  (13),  then  each  one  of  the  convex  polytopes 
S/,  /=!,  2....,r,  will  contain  the  true  backward  PARCOR  coefficient  vector  6'*.  This  is 
because  The  OBE  algorithm  can  now  be  applied  with  the  parameter  vector 

9  set  equal  to  the  PARCOR  coefficient  vector  9'  and  the  regressor  vector  <?(/)  set  equal  to  the 
vector  of  background  errors  ^'{/)  thus  yielding  the  time  update  for  the  backward  PARCOR 
coefficients.  However,  it  is  clear  from  (13)  that,  in  order  to  obtain  the  backward  errors  of 
different  orders  at  time  i,  the  forward  PARCOR  coefficients  at  time  r  -  1  are  required.  The  time 
update  equations  for  the  fonvard  coefficients  can  be  obtained  as  follows. 

Iterating  (13)  yields 

rv(r)  =  v(r  -  .V)  -  k[{t  -  N)e^{t  -  iV  + 1) 

-/:[(r-.V+l)e,{r-/V  +  2)’--  - - l)ev-,(0. 

Since  the  backward  errors  are  expected  to  be  bounded,  one  can  therefore  define  a  convex 
polytope  in  the  space  of  the  forward  PARCOR  coefficients  as 

S"(r)  =  /r':(v(r-.V)-rV'(f)r<Y''}. 

where 

9''  =  {kl  A'{....Ai_,f  and  =  -  N +  \),  e^(t- N +  2),...e^_y(t)f. 

Vo‘  IS  set  higher  than  to  ensure  that  S"{t)  contains  the  true  forward  P.^RCOR  coefficient 
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vector.  It  is  worth  noting  that  the  exact  value  of  Yq"  is  not  critical  here  as.  according  to  our  I 
experience,  the  algorithm  is  relatively  insensitive  to  the  values  of  such  bounds.  The  algorithm  I 
formulated  by  (7)  with  6  =  6”  and  ^  can  now  be  applied  to  obtain  the  time  updates  of  the 
forward  PARCOR  coefficients.  An  important  point  to  be  noted  here  is  that  for  the  backward  I 

error  recursions  at  time  r.  the  estimates  ko(t  -  1),  k{(t  -  1), _ -  1)  are  required.  How-  | 

ever,  the  OBE  time  update  at  time  /  -  1  has  made  available  the  coefficients  kgit  -  N),  k\{t  -  N 
+ 1),..., /cjv_,(r  -  1).  The  former  set  of  coefficients  thus  has  to  be  approximated  by  the  latter  I 
set.  This  is  a  valid  approximation  for  small  N  (as  verified  by  simulation)  because  then  k„{t  -  1)  | 

is  approximately  equal  to  k„{t  -N  +  m). 

For  the  stationary  case,  since  the  backward  and  forward  coefficients  are  expected  to  be  equal,  i 
the  OBE  needs  to  be  applied  only  once  to  the  forward  error  (i.e.,  to  the  backward  coefficients).  | 
The  algorithm  complexity  is  thus  the  same  as  that  of  the  direct  implementation.  In  general,  the 
computational  complexity  of  the  lattice  implementation  is  twice  that  of  the  direct  form  because  i 
the  OBE  algorithm  is  applied  two  times  at  every  iteration.  However,  the  order  of  computation  is  I 
still  0{N^). 

*  I  I 

4.  Extension  to  ARMA  models  ' 

Autoregressive  moving  average  (ARMA)  models  are  described  by  difference  equations  of  the  i 
form  I 

y{t)  -  axy{t  -  1)  +  •  •  •  +a„y{t  -•  n)  +  w{t)  +  c,w(r  ~  1)  +  •  •  •  +c,>v(r  -  r)  (16) 
where  >»(/)  is  the  output  and  h'(/)  is  an  unobservable  white  noise  sequence.  This  equation  can  be 


recast  as 

=  +  »’(<), 

(17) 

where 

5*  =  [qi.  a2,...,a„,  c, . c,]^ 

(18) 

is  the  vector  of  true  parameters,  and 

=  l>’(^  -  1).  y{t  -  2),..  .,>'(?  -  n),  w{t-  1) . w{t  -  r)]^ 

is  the  true  regressor  vector.  It  ts  assumed  that  the  noise  is  uniformly  bounded  in  magnitude,  i.e., 
there  exists  Yq  >  0,  such  that 

w-(r)<Yo  for  all  t.  (19) 

Since  the  values  of  w(r)  are  unknown,  the  OBE  algorithm,  in  its  present  form,  cannot  be  used  to 
estimate  the  parameters.  However,  if  estimates  of  w(r)  are  used  in  place  of  the  actual  values,  as 
m  the  ELS  algorithm,  men  the  algorithm  (7)  can  be  used  to  construct  a  sequence  of  optimal 
bounding  ellipsoids.  A  natural  estimate  of  w(i)  is  the  a  posteriori  prediction  error  (also  termed 


residual  by  some  authors) 

«(0  =y(0 (20) 

where  now 

'^'{0  =  1),  y{t~2),...,y{t-n),  e(r-l) . e(r-r)]^.  (21) 
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The  extended  optimal  bounding  ellipsoid  algorithm  (EOBE)  [13.21]  thus  consists  of  (7)  and  the 
same  selective  update  strategy,  with  the  true  parameter  vector  and  the  regressor  vector  as  defined 
in  (18).  and  (21)  respectively.  The  initial  conditions  (11)  are  modified  to 

?-‘(0)  =  A/-/.  ^(0)  =  0  and  a-(0)<y^  withA/»l.  (22) 

This  choice  of  initial  conditions  still  ensures  that  the  initial  ellipsoid  Eq  will  contain  6*  and 
makes  the  algorithm  amenable  to  analysis.  It  also  simplifies  the  formula  for  determining  the 
updating  gain  factor.  In  particular.  X,  is  always  less  than  unity,,  hence  there  is  no  need  to 
introduce  an  upper  bound  for  the  updating  gain  factor.  Also  note  that  y’  in  (22)  is  different 
from  yo  in  (19). 


Analysis  of  the  EOBE  algorithm.  It  is  easy  to  see  that,  since  estimates  of  w{t)  are  used  in  the 

regressor  vector,  there  is  no  guarantee  that  all  the  convex  poly  topes  S,,  /  =  1.  2 . will  contain 

$*,  However,  it  has  been  shown  [21]  that  all  the  convex  polytopes  will  contain  6*  if  (i) 
contains  6*,  (ii)  the  true  moving  average  coefficients  satisfy  a  certain  upper  bound  (analogous  to 
the  Strictly  Positive  Real  (SPR)  condition  in  the  ELS  algorithm),  and  (iii)  the  threshold  y*  is 
chosen  appropriately  [21].  The  conditions  are  of  course  only  sufficient  conditions,  and  the 
algorithm  has  been  observed  to  perform  well  in  several  examples  where  the  conditions  (ii)  and 
(iii)  were  violated. 

Using  this  result,  the  following  bounds  on  the  prediction  error  and  parameter  estimation  error 
can  be  obtained  (see  [13,21]  for  details). 

(a) 

Urn  exists, 

l.-'OO  ^ 


where  (ty)  is  the  subsequence  of  updating  instants  of  the  EOBE  algorithm. 

(b)  Uniformly  bounded  a  posteriori  prediction  errors: 

<’(0  <  for  all  time  instants  i. 

Furthermore,  if  a  certain  persistence  of  excitation  condition  holds,  then  for  any  finite  k, 

(c) 

lim  II  d{t)  —  B{t  -  /r)  II  =  0. 


(e)  Asymptotically  bounded  a  priori  prediction  errors: 
5'(r)-[0,  y^l. 

(0  Asymptotically  bounded  parameter  estirhation  error'. 

i|0(r)-r||--[o,  2yo-{l  +  Elcj)V«4i 

where  y^  is  as  in  (19)  and  04  is  a  positive  constant. 


The  above  results  do  not  require  the  system  (16)  to  be  stable  or  the  noise  sequence  w{t)  to  he 
white.  However  our  simulation  experience  has  shown  that  the  parameter  estimates  are  usually 
not  close  to  the  true  parameters  if  the  noise  is  colored,  but  such  is  also  the  case  for  the  ELS 


26 


524 


A.K.  Rad,  Y.-F.  Huang  /  OBE  parameter  estimation 


Fig.  2.  Mean-squared  parameter  estimation  errors  of  the  EOBE  and  ELS  algorithms  (Example  2). 

algorithm.  The  EOBE  algorithm  performs  well  when  the  noise  sequence  tv(f)  is  white.  In 
particular,  the  transient  performance  of  the  algorithm  for  stable  and  unstable  ARMA  type 
systems  with  w{t)  white  appears  to  be  superior  to  that  of  the  ELS  algorithm.  This  observation  is 
illustrated  by  the  following. 

Example  2.  The  following  ARMA  (3,3)  model  is  considered 

y{t)  -  -0.6  ;'(/  -  1)  +  0.2  >>(/  —  2)  +  0.4  y{t  -  3)  +  h'(/) 

-  0.22  w(f  -  1)  +  0.17  w{t  -  2)  -  0.1  w(r  -  3). 

The  white  noise  sequence  w.{t)  is  uniformly  distributed  in  [-1. 1].  Ten  Monte  Carlo  runs  of  the 
OBE  and  ELS  algorithms  are  performed  with  data  records  of  50  points  each.  The  threshold 
Y'  =  25.  The  parameter  estimation  error  at  each  instant,  {6*  -  -  6{i))  and  the  a  priori 

prediction  error  are  averaged  over  the  ten  runs  and  displayed  in  Fig.  2  and  Fig.  2  respectively. 
The  parameter  estimates  of  the  ELS  algorithm  tend  to  wander  outside  the  stability  region  in  the 


Fig.  3.  Mean-squared  prediction  errors  of  the  EOBE  and  ELS  algorithms  (Example  2). 
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transient  stage,  thus  causing  unacceptably  high  prediction  error  bursts.  The  inherent  stability 
mechanism  of  the  ELS  algorithm,  however,  ensures  :that  the  estimates  do  return  to  the  stability 
re^on.  The  transient  estimation  error  of  the  EOBE  algorithm,  in  contrast,  is  well  behaved.  This 
seems  to  provide  a  good  incentive  for  employing  EOBE.  rather  than  ELS,  when  few  data  are 
available. 


5.  Conclusion 

It  has  been  shown  that,  on  account  of  their  low  computational  complexity  and  analytical 
tractability,  the  OBE  algorithms  can  serve  as  alternatives  to  standard  adaptive  filtering  al¬ 
gorithms  in  situations  where  the  noise  is  unknown  but  bounded.  As  in  the  least-squares  case,  the 
OBE. algorithm  can  be  implemented  in  a  lattice  form  and  can  thus  acquire  all  the  advantages  of 
the  lattice  structure.  The  extension  to  the  colored  noise  case  is  performed  as  in  extended  least 
squares,  and = sufficient  conditions  for  “convergence"  of  the  algorithm  have  been  outlined.  The 
transient  performance  of  the  algorithm,  in  terms  of  parameter  estimation  error  and  prediction 
error,  has  been  observed  to  be  superior  to  that. of  the  ELS  algorithm. 
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ANALYSIS  OF  FINITE  PRECISION  EFFECTS  ON  A  RECURSIVE  SET 
MEMBERSHIP  PARAMETER  ESTIMATION  ALGORITHM^ 


Ashok  K.  Rao2  and  Yih-Fang  Huang^ 


Abstract 

Analysis  of  error  propagation  in  an  OBE  algorithm  is  performed  which  shows  that  the 
errors  in  the  estimates  due  to  an  initial  perturbation  are  bounded.  Simulation  results 
demonstrate  that  the  OBE  algorithm  can  perform  better  than  the  conventional  RLS  in  small 
word'length  environments.  The  analysis  presented  in  the  paper  could  also.be  applied  for  the 
finite  precision  analysis  of  recursive  weighted  least-squares  algorithms. 
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I  INTRODUCTION 

Set  Membership  Parameter  Estimation  (SMPE)  algorithms  f  1-3]  are  a  class  of  estimation 
algorithms  which  yield  a  set  of  feasible  parameter  vectors  consistent  with  the  observations, 
model  structure  and  noise  constraints.  This  is  in  contrast  to  least-squares  type  or  stochastic- 
gradient-bascd  algorithms  which  coifiputc  point  estimates  of  model  parameters. 

The  SMPE-  algorithms  do  not  assume  any  knowledge  of  the  distribution  or  any  other 
statistical  properties  of  the  noise  process.  However  it  is  assumed  that  the  noise  is  bounded, 
either  in  magnitude  or  energy.  The  performance  of  SMPE  algorithms  is  often  superior  to  the 
least-squares  algorithms  for  cases  when  the  noise  process  does  not  satisfy  the  usual  white  and 
stationary  assumptions  [4]  and  when  the  sample  size  is  small  [5].  Furthermore,  these 
algorithms  yield  100%  confidence  regions  for  the  parameters  even  for  small  sample  sizes,  in 
the  case  of  batch  algorithms,  and  at  every  time  instant  with  recursive  algorithms. 

The  behavior  of  least-squares  and  stochastic-gradient-based  adaptive  filtering  algorithms  in 
limited  precision  environments  has  attracted  a  lot  of  attention  [6),  [7].  However,  in  the  case  of 
SMPE  algorithms,  the  issue  of  finite  word-length  effects  has  been  largely  ignored  till  recently. 
In  [8],  the  potential  numerical  problems  which  can  arise  with  the  exact  cone  updating  (ECU) 
algorithm  are  discussed  and  a  robust  modification  is  suggested.  In  this  paper,  finite  precision 
effects  on  one  of  the  Optimal  Bounding  Ellipsoid  (OBE)  algorithms  are  studied  through 
analysis  and  simulations.  The  OBE  algorithms  obtain  recursively,  ellipsoidal  outer  bounds  of 
the  membership  set  of  parameters  of  ARX  models  with  bounded  noise.  The  algorithms  have 
the  distinctive  feature  of  a  discerning  update  strategy. 

A  brief  description  of  the  OBE  algorithm  of  [9]  is  given  in  Section  H.  A  first  order  analysis 
of  the  error  propagation  in  the  OBE  algorithm  is  then  performed  which  shows  that  the  error  in 
the  estimates  at  any  time  instant  due  to  an  initial  penurbation  is  bounded.  The  finite  precision 
effects  are  also  analyzed  from  an  alternate  geometric  point  of  view.  Results  of  a  fixed  point 
type  simulation  of  the  algorithm  are  presented  which  show  that  the  OBE  algorithm  yields 
consistently  good  estimates  over  a  large  range  of  word-lengths.  In  fact,  the  performance  is 

superior  to  that  of  the  RLS  algorithm  for  small  word-lengths. 

I 
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II  THE  OBE  ALGORITHM 


The  OBE  algorithms  [1],[9]  estimate  the  coefficients  of  ARX  processes  described  by 

y(t)  =  d)(t)  +  v(t)  (2.1) 

where 

0(t)  =  [y(t-l),...,y(t-n),  u(t),  u(t-l) . u(t-m)]T'  (2.2) 

is  the  regressor  vector  consisting  of  past  outputs  {y(t)}  and  present  and  past  inputs  {  u(tM,  and 

9*  is  the  true  parameter  vector,  specifically  6*=  [a^  ,...,  a„,  bg,..,  bn^]T.  [1,9].  The  noise 

sequence  (v(t)}  is  assumed  to  be  uniformly  bounded  with  a  known  bound  0,  i.e., 

v2(t)<Y2  for  all  t  (2.3) 

The  OBE  algorithms  obtain,  recursively,  a  “decreasing”  sequence  {EJ  of  optimal  outer 
bounding  ellipsoids  in  the  n+m+1  dimensional  parameter  space.  The  ellipsoid  Ei  can  be 


expressed  as 

Et  =  (0  €  Rn+m+i ;  (0  _  0(t)  ]T  p.i(t)  [0  _  0(t)]  <  a2(t)  }  (2.4) 

where  P*i(t)  is  a  positive  definite  matrix  and  0(t)  is  the  center  of  the  ellipsoid  which  can  be 
taken  to  be  a  point  estimate  of  the  parameter  vector.  The  factor  o2(t)  is  a  positive  time-varying 
scalar  which  along  with  P(t)  determines  the  size  of  Et.  Time  recursions  for  P(t),  0(t)  and  o2(t) 
are  given  below,  see  [9]  for  a  derivation  of  these  equations. 


P(t) 

-  ^  rrrt  n  X,P(t-l)O{t)<l>T(0P(M)  . 

‘ '  l-X,  +  X,OT(t)P(t.  1  )<l)T(t)J 

(2.5) 

a2(t) 

-  (1  l,)  am-D*  -  i_x,+  x,0T(,)p(,.i)0t(,) 

(2.6) 

9(0 

=  8(t-l)  + 1,  P(t)<t>(t)(y(t)-<l>T(i)e(t-l)) 

(2.7) 

The  initial  conditions  are  chosen  to  ensure  that  0*  s  Eq.  A  possible  choice  which  improves  the 

robusmess  of  the  algorithm  to  finite  word-length  effects  is 

P(0)  =  MI,anda2(0)=72  where  M»l.  (2.8) 

The  optimal  ellipsoid  E,  is  defined  in  terms  of  an  optimal  value  of  the  updating  factor  e  [0,a) 


where  a  <1,  is  a  user  chosen  upper  bound  on  the  updating  factor.  For  the  OBE  algorithm  of 
[9],  the  optimum  value  is  determined  by  minimization  of  a2(t)  with  respect  to  X,at  every 


time  instant.  The  minimization  procedure  results  in  a  discerning  update  procedure.  In 


particular, 

if  a2(t-l)  +  52(t)  <  y2  Xj  =0  (no  update) 
else  Xj  =  min  (a,  0J(t) ), 


(2.9) 

(2.10) 


2 


with 
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,  (1-  3(t))/2  ^ 

ifG(t)=I 

m)=  \ 

(2.11) 

\  i  rjr  G>t)  ji, 

1-0(0*-  i+p(t)(G(t)-l)^ 

ifG(t)^  1 

where 

5(t)  =  y(t)-0T’(t-l)a)(t) 

(2.12) 

G(t)  =  <I)T(t)P(t-l)<D{t) 

(2.13) 

and 

,P(t)  =  (Y^-<y2(t-l))/52(t) 

(2.14) 

ITie  above  recursive  relations  (2.5)-(2.7),  and  the  updating  factor  formula  (2.9)-(2.14)  form 
the.OBE  estimation  algorithm. 


Ill  ERROR  PROPAGATION 

The  error  propagation  properties  of  the  OBE  algorithm  are  analyzed  here  by  focusing  on  the 
propagation  of  a  single  error  in  0(t)  and  P(t)  to  future  instants.  Assume  that  at  time  instant  to 
there  is  a  permrbation  in  the  estimates  due  to  round-off  error,  yielding  0’(to)  =  0(to)  +  A0(io) 
and  P'(to)  =  P(to)+AP(to),  where  the  primed  quantities  are  the  perturbed  ones.  We  investigate 
in  this  section  the  effect  of  these  errors  on  the  estimates  0'(t)  and  P’(t)  at  t  >  to,  assuming  that 
the  computations  are  performed  with  infinite  precision.  Similar  studies  have  been  performed 
by  Ljung  and  Ljung  [10]  in  their  investigation  of  the  error  propagation  properties  of  RLS 
algorithms.  Though  the  update  equations  of  the  OBE  algorithm  are  similar  to  those  of  the  RLS 
algorithm,  the  presence  of  the  updating  factor  as  a  discontinuous  function  of  the  estimates 
complicates  the  analysis.  Employing  a  first  order  perturbation  analysis,  an  upper  bound  on  the 
error  in  the  estimates  due  to  finite  precision  computations  can  be  obtained  as  described  below. 
Theorem  1.  If  the  following  assumptions  hold: 

(i)  The  matrix  P(t)  is  well  conditioned,  i.e.  there  exist  positive  t]]  and  t\2  such  that 

0  <  T)  1  <  A^min  [P(t)]  and 

^max  [P(t)]  <  ri2  forallt>0  (3.1) 

where  ?cmin[-]  and  A^ax[-]  refer  to  minimum  and  maximum  eigenvalues  respectively. 

(ii)  The  ARX  process  is  stable  and  has  bounded  inputs,  thereby  implying  the  existence  of  a 
positive  K  such  that 

d>T'(t)d)(t)  <  K  for  all  t  >  0  (3.2) 
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(iii)  The-unperiurbed  algonihm  yields  bounded  preaiction  errors,  i.e.,  ihere  exists  an  h  >0,  s.t. 

!5a)l$h  forailt>0  (3.3) 

(iv)  There  exists  an  integer  M  such  that  if  the  unpentirbed  algorithm  has  M  updates  in  an 
interval  of  time,  then  the  perturbed  version  updates  at  least  once  in  that  interval, 

(v)  At  the  updating  instants  of  the  pemirbed  algorithm,  a  lower  bound  p  is  set  for  the  updating 
factor  Xt',  where  p  is  a  suitably  small  positive  number. 

Then  the  error  between  the  perturbed  and  unperturbed  quantities  at  the  updating  instants 

of  the  perturbed  algorithm  is  bounded  as 

i  -  -1  in 

IIAPCt  )ll  <  (IL  r  (l-p)M  1IAP(0)1I  +q;  (k+  1 )  max  AX  M  — -  .3  4^ 

Til  -  i<u<k  'u  p  ^  ^ 

IIA0(t.)ir  <  (l-p)‘‘"^‘^  II  Ae(gil+Ti.hK‘/2  jnaxlAX,  |~[l-(l-p)^’''^'']  + 

T],  '  J  P  (3.5) 

h  _2  ||AP(t.)ll 
Til  i<jsk  J 

where  Tii  and  7)2  are  as  in  (3.1);  LxJ  is  used  to  denote  the  largest  integer  less  than  x  and  11.11  is 
us^  for  both  the  euclidean  vector  norm  and  the  compatible  matrix  norm. 

Proof:  See  [11]. 

Remarks 

(1)  The  first  term  in  (3.4)  and  (3.5)  reveals  an  exponentially  decaying  effect  of  the  initial 
perturbation.  The  second  term  depends  on  the  error  introduced  by  the  initial  perturbation  in  the 
calculation  of  the  updating  factor.  The  additional  error  term  in  (3.5)  is  due  to  the  errors  in  P(t). 

(2)  Assumptions  (i)  and  (iii)  have  been  shown  to  hold  in  [9]  if  the  system  input  u(t)  and  the 
noise  v(t)  satisfy  certain  persistence  of  excitation  type  of  conditions. 

(3)  Assumption  (v)  is  a  technical  device  required  to  ensure  that  the  homogeneous  parts  of  (3.4) 
and  (3.5)  are  exponentially  stable.  If  p  <  0.001,  then  in  practice  the  values  of  Xt  at  the 
updating  instants  will  usually  be  larger  than  p. 

(4)  Note  that  the  analysis  of  error  propagation  has  ignored  the  effect  of  round-off  errors  in 
c  jinputations.  However,  since  -he  homogeneous  parts  of  (3.4)  and  (3.5)  are  exponentially 
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stable,  the  eifors,at  any  time  insmt  due  to  round-off  errors  created  at  previous  time  instants 
wouM,  be  bounded  [  1 01 . 

IV  EFFECTS  ON  THE  BOUNDING  ELLIPSOID 
In  this  section,  the  effect  of  round-off  errors  (in  one  iteration)  on  the  resulting  bounding 
ellipsoid  is  studied.  More  specifically,  we  ask  the  following  question:  If  0*  e  Et-i,  can  errors 
in  the  computation  of  Et  (i.e.,  computation  of  9(t),  P(t)  and  a^ft))  cause  0*  €  Ef 
Define  ?(t)  =  0(t)-0*.  Then  from  (2.7) 

9(t)  =g(t-l)  +  )4P(t)0(t)5(t)  Ai  (4.1) 


where  Ai  is  the  round-off  error.  Similarly  from  (2.5a) 

P-i(t)  =  (1-Xt)  P-‘(t-l)  +  Xia)(t)(I>T(t)  +  A2 


and  froni,(2.6) 

a2(t)  =  (1-Xt)  a2(t-l)+  Xt  - 


Define 


Xt(l-Xt)  52(t) 

l-Xt-»-XtOT(t)P(t-l)0'r(t) 


At=9(t-l)+XtP(t)<I>(t)8(t) 


Bt  =  (1-Xt)  P-kM)  +  Xia)(t)OT(f)  (4.5) 

Then,  after  neglecting  second  and  higher  order  terms  in  Ai  and  A2,  it  can  be  shown  that 

V(t)  =  AtT  Bt  At  +  AtTAaAt  +  AiT  Bt  At  +  A,T  Bt  Ai  (4.6) 

where 

V(t)=?(t)P-l(t)B(t)  (4.7) 

Expanding  At^  Bt  At  and  using  (4.3)  yields 

V(t)-a2(t)  =  (1-Xt)  [V(t-l)-a2(t.i)]  +  Xt  [v2(t)-Y2] 

+  2AiT’  Bt  At  +  At^AiA,  -i-As  (4.8) 

From  the. definition  of  Ei  it  is  clear  that  0*e  Ei  if  and  only  if  V(t)  <  a2(t).  Thus  if  the  errors 

Ai,  A2,  and  A3  are  large  enough,  it  is  possible  that  0*  «  Et.  A  sufficient  condition  for  0*  6  Et 

is  1  2AiT  Bt  At  +AtTA2At  +A3I  <  Xt  [f  -  v2(t)]  (4.9) 

If  Xt  =  0  then  since  no  update  occurs  0*  e  Et  automatically.  The  condition  (4.9)  shows  that  if 


the  errors  due  to  finite  word-length  computations  are  small  enough  then  0*6  Et.  Furthermore, 
by  setting  7^  higher  than  the  actual  bound  on  the  noise,  the  robustness  of  the  algorithm  with 
respect  to  finite  precision  effects  can  be  increased  at  the  expense  of  increasing  the  size  of  the 
bounding  ellipsoids. 
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V  SIMULATION  STUDIES 

A  fixed  point  implementation  of  . the  OBE  algorithm  was  simulated  by  assigning  a  fixed 

number  of  bits  (ibit )  to  represent  the  fractional  pan  of  the  algorithmic  variables.  By  varying 

ibit  a  fairly  accurate  portrayal  of  the  behavior  of  the  algorithm  in  a  real-world  restricted  word- 

length  environment  can  be  obtained.  A  similar  scheme  has  been  used  in  [12]  to  characterize  the 

performance  of  the  RLS  algorithm.  The  noise  sequence  {v(t)|  and  the  input  sequence  {u(t))  are 

generated  by  a  pseudo-random  number  generator  with  a  uniform  distribution  in  [-1.0, 1.0], 

The  upper  bound  is  set  equal  to  1.0.  A  value  of  a  =  0.1  was  used  since  it  yielded  a 

satisfactory  convergence  rate  and  inhibited  overflows  in  the  update  equation  for  P(t).  The 

parameter  estimates  are  obtained  by  applying  the  OBE,  RLS  and  EWLS  (RLS  with  weighting 

factor  X  =0.99)  to  1000  point  data  sequences.  For  the  OBE  algorithm,  the  centers  of  the 

optimal  bounding  ellipsoids  are  taken  to  be  the  estimates.  Ten  runs  of  the  algorithms  are 

performed  on  the  same  model  but  with  different  noise  sequences.  The  number  of  bits  used  for 

the  fractional  pan,  ibit,  is  varied  from  16  down  to  6  bits  and  the  average  of  the  parameter  error 

ll0(lOOO)-6*  II 2  is  computed  for  each  value  of  ibit. 

Example  5.1  (Fig.  5.1)  An  ARX(2,3)  process 

y(t)  =  1.6y(t-l)-0.83  y(t-2)+0.14  u(t)  +u(t-l)  +0.16  u(t-2)  +v(t) 

The  average  tap  error  of  the  OBE  algorithm  appears  constant  as  ibit  varies  from  16  to  8  bits. 

The  P  matrix  became  negative  definite  for  ibit  =  6.  The  RLS  and  EWLS  algorithms  do  not 

work  well  for  ibit  <10.  In  fact  P  became  indefinite  for  ibit  <  14,  in  the  EWLS  case. 

Example  5.2  (Fig.  5.2)  An  ARX(10,10)  process 

The  OBE  algorithm  worked  well  for  ibit  >  12.  However  for  smaller  values,  P  became 

indefinite  and  overflows  occurred.  For  the  RLS  case,  P  became  indefinite  for  ibit  <  16.  In 

order  to  study  the  performance  of  the  OBE  algorithm  at  smaller  word-lengths,  a  UDU' 

factorization  of  the  P  matrix  was  performed.  The  OBE  update  equations  are  identical  to  the 

update  equations  of  the  weighted  RLS  algorithm  with  weight  ttt  =  Xt,  and  forgetting  factor  X(t) 

=  (1-Xt)  and  hence  the  UDU'  form  of  the  OBE  can  be  easily  developed  [13,  pg.  334].  The 

UDU'  form  of  the  OBE  algorithm  is  then  compared  to  the  UDU'  form  of  the  RLS  algorithm. 
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The  simulation  results  show  that  for  larger  word-lengths,  the  performance  of  the  RLS 
algorithm  is  superior-  For  smaller  values  of  ibit,  however,  the  average  parameter  estimation 
error  is  about  the  same  for  both  the  OBE  and  the  RLS  algorithms. 

Discussions 

Example  5.2  shows  that  the  performances  of  the  UDU'  versions  of  OBE  and  RLS 
algorithms  are  comparable  at  smaller  word-lengths.  The  superior  performance  of  the 
straightforward  implementation  of  the  OBE  algorithm,  as  compared  to  the  RLS  or  EWLS 
algorithms  at  smaller  word-lengths  is  therefore  primarily  due  to  the  superior  numerical 
properties  of  the  recursion  for  the  matrix  P(t).  The  update  equation  for  the  RLS  algorithm  with 
a  forgetting  factor  X  is 

p(t)=  [ ,  Pd-D'KOo’^d)  j P(i-i) 

X  +  <I>\i)P(i-l)<|i(t)  ^  (5  1) 

The  corresponding  equation  for  the  OBE  algorithm  can  be  rewritten  as 

n(,)  _  [ ,  P(l-l)<t(l)0^(t)  ]  P(M) 

1-X,  T  ^  I-*-,  (5-2) 

— !-  +  <t>  (DPft-lWt)  ' 

\  ■ 

Since  1-  \  plays  the  same  role  in  the  OBE  algorithm  as  does  k  in  the  RLS  algorithm,  the  only 
difference  between  (5.1)  and  (5.2)  is  that  the  factor  (1-  X, )/  X,  appears  in  the  denominator  of 
the  term  within  braces  in  (5.2)  as  opposed  to  the  corresponding  term  X  in  (5.1).  The 
degradation  of  performance  occurs  primarily  because  the  term  within  braces  becomes  indefinite 
on  account  of  round-off  errors.  Since  X,  is  usually  much  smaller  than  unity,  the  term  which  is 
being  subtracted  from  the  identity  matrix  in  (5.2)  is  much  smaller  than  the  one  in  (5.1).  Thus 
P(t)  in  the  RLS  algorithm  has  a  greater  tendency  to  become  indefinite  than  the  P(t)  in  the  OBE 
algorithm.  This  observation  has  been  confirmed  by  examining  the  eigenvalues  of  P(t),  for  runs 
in  which  the  RLS  algorithm  performed  poorly. 
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VI  CONCLUSION 

The  analysis  of  error  propagation  in  the  OBE  algorithm  has  shown  that  the  algorithm  is 
stable  with  respect  to  small  computational  errors.  As  in  the  RLS  case,  the  robusmess  of  the 
algorithm  is  due  to  the  presence  of  an  updating  gain/forgetting  factor.  Stability  of  the  algorithm 
has  also  been  viewed  from  an  alternate  geometric  approach.  The  analysis  shows  that  the 
bounding  ellipsoids  are  valid  bounds  for  the  membership  sets  as  long  as  computational  errors 
arc  not  too  large  and  that  the  robustness  of  the  algorithm  can  be  increased  by  increasing  the 
value  of  the  noise  bound.  Simulation  results  show  that  the  OBE  algorithm  is  indeed  stable  for 
moderate  word-lengths  and  that  the  mean  parameter  estimation  error  is  relatively  constant  over 
a  wide  range  of  word-lengths.  In  fact,  it  was  observed  that  the  performance  of  the  OBE 
algorithm  is  superior  to  that  of  the  RLS  algorithm  for  small  word-lengths. 
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Figure  5. 1  Average  parameter  estimation  error  for  the  OBE 
and  RLS  algorithms  for  an  ARX(2,3)  process 
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Figure  5.2  Average  parameter  estimation  error  for  the  OBE 
and  RLS  algorithms  for  an  ARX(10,10)  process 
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Abstract 

Recently  there  seems  to  have  been  a  resurgence  of  interest  in  recursive  parameter 
bounding  algorithms.  These  algorithms  are  applicable  when  the  noise  is  bounded  and  the 
bound  is  known  to  the  user.  One  of  the  advantages  of  such  algorithms  is  that  100% 
confidence  regions  (which  are  optimal  in  some  sense)  for  the  parameter  estimates  can  be 
obtained  at  every  time  instant,  rather  than  asymptotically  as  in  the  least-squares  type 
algorithms.  Another  advantage  is  that  these  recursive  algorithms  have  the  inherent 
capability  of  implementing  discerning  updates,  particularly  that  of  allowing  no  updates  of 
parameter  estimates  in  the  recursion.  This  paper  investigates  tracking  properties  of  one 
.mch  algorithm,  referred  to  as  the  DHOBE  algorithm.  Conditions  which  ensure  the 
existence  of  these  100%  confidence  regions  in  the  face  of  small  model  parameter  variations 
are  derived.  For  larger  parameter  variations,  it  is  shown  that  the  existence  of  the  100% 
confidence  regions  is  guaranteed  asymptotically.  A  modification  is  also  proposed  here  to 
enable  the  algorithm  to  track  large  variations  in  model  parameters.  Simulation  results  show 
that  in  general,  the  modified  algorithm  has  tracking  performance  comparable,  and  in  some 
cases  superior,  to  the  exponentially  weighted  recursive  least-squares  algorithm. 
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I.  INTRODUCTION 


Perfonnance  analysis  of  adaptive  fiitenng  algorithms  is  usually  done  by  assuming  that 
the  unknown  system  being  modeled  is  time-invariant.  However,  in  practice,  adaptive 
filters-are  often  used  in  time  varying  environments.  It  is  thus  important  to  investigate  the 
performance  of  these  algorithms,  allowing  the  system  model  parameters  to  vary  with  time. 
A  considerable  amount  of  attention  has  been  paid  to  this  problem  in  the  adaptive  filtering 
literature,  with  analysis  of  varying  amounts  of  rigor  being  performed  mainly  for  the  LMS 
and  RLS  algorithms,  see,  e.g.,  [1-51. 

This  paper  investigates  tracking  properties  of  a  recursive  estimation  algorithm,  referred 
to  hereafter  as  the  DHOBE  (Dasgupta-Huang  Optimal  Bounding  Ellipsoid)  algorithm  [6], 
this  algorithm  belongs  to  a  class  of  bounded-error  estimation  algorithms  termed  Set- 
Ment^2rship  Parameter  Estimation  (SMPE)  algorithms  [7], [8].  The  membership  set  is  a  set 
of  parameter  estimates  which  are  compatible  with  the  model  of  the  underlying  process,  the 
assumptions  on  noise,  and  the  observation  data.  At  the  first  glance  the  DHOBE  algorithm 
appears  to  be  very  similar  to  the  recursive  least-squares  (RLS)  algorithm.  However,  in 
contrast  to  the  RLS  algorithm  which  obtains  an  optimal  solution  (in  the  sense  of  minimum 
mean-square  estimation  error)  to  the  underlying  problem,  the  DHOBE  algorithm  is 
developed  by  using  a  set-theoretic  framework,  namely,  the  notion  of  optimal  hounding 
ellipsoids.  This  causes  the  algorithm  to  behave  quite  differently  from  the  RLS  algorithm  in 
many  ways.  In  addition,  the  algorithm  incorporates  a  data  dependent  forgetting  factor 
which  results  in  a  discerning  update  strategy. 

In  case  of  time-varying  systems,  it  is  important  to  ensure  that  the  time  varying  true 
parameters  (0*(t)}  are  contained  in  the  bounding  ellipsoids  [Ei)  of  the  DHOBE  algorithm. 
In  this  paper,  such  conditions  will  be  derived.  It  will  also  be  shown  that  if  a  jump  in  the 
true  parameter  vector  9*(t)  causes  it  to  fall  outside  the  bounding  ellipsoid,  then  provided 
that  the  jump  is  not  too  large  the  bounding  ellipsoids  will  move  towards  0‘(t)  and 
eventually  enclose  0*(t)  again.  A  rescue  scheme  is  proposed  which  will  guarantee  the 
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existence  of  bounding  eliipsoids  in  the  face  of  large  parameter  \  anations.  Some  techniques 
for  applying  different  parameter  bounding  algorithms  to  time  \  arying  systems  have  been 
reported  by  Nonon  and  Mo  in  [9].  One  of  the  techniques  suggested  for  the  OBE  type 
algorithms  is  to  use  a  fixed  scaling  factor  to  inflate  the  bounding  ellipsoid  with  every  new 
data  point.  Another  technique  which  can  be  used  if  prior  knowledge  of  the  parameter 
increments  is  available  is  to  vector  sum  the  bounding  ellipsoid  with  the  set  describing  the 
parameter  variation  [91.  If  the  extent  of  parameter  variation  is  unknown,  as  is  often  the 
case,  the  first  technique  will  have  to  use  a  large  scaling  factor  to  cope  with  possibly  large 
parameter  variations  and  consequently  the  parameter  bounds  will  be  loose.  In  contrast,  the 
rescue  procedure  described  in  this  paper  can  automatically  detect  and  accurately  compensate 
for  large  parameter  jumps. 

Simulation  results  are  presented  to  show  that  the  DHOBE  algorithm  is  able  to  track 
slow  and  abrupt  variations  in  the  parameters.  The  tracking  performance,  in  terms  of 
parameter  estimation  error,  is  comparable  to  the  RLS  algorithm  with  a  forgetting  factor. 
Abrupt  changes  in  the  parameter  can  in  some  cases  be  tracked  better  by  the  DHOBE 
algorithm  than  by  the  RLS  algorithm. 

II.  THE  DHOBE  ALGORITHM 

One  of  the  seminal  works  in  the  estimation  of  parameter  bounds  is  that  of  Fogel  and 
Huang  [10].  The  algorithm  of  [10]  recursively  obtains  ellipsoidal  outer  bounds  to  the 
membership  set.  The  model  stnicture  consider  ed  is  the  following  .ARX  model: 

ytt)  =  0‘'*’O(t)  -r  v(t)  (2.1) 

where 

9‘=  [ai  a2  ...  an  Pq  .••  bmF 
is  the  true  parameter  vector  tind 

(hit)  =  ly(t-l)  y(t-2) ...  y(t-n)  u(t)  u(t-l) ...  utt-m))^' 
is  the  measurable  regressor  \  ector.  The  noise  v(t)  is  assumed  to  be  uniformly  bounded  in 
magnitude  with  a  known  bound  y.  i.e.. 
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Assume  that  at  time  instant  t-  i .  :he  exact  membership  set  is  outer  bounded  by  the  ellipsoid 
Et-i  described  by 

Et.i=  {Gs  R‘'^:[e-0(t-nrrp-»(t-l)  [0-0(t-l)l  <a"(t-l))  t2.3) 

where  N=n+m+l,  P‘Ht-1)  is  a  positive  definite  matrix,  and  0(t-l)  is  the  center  of  the 
ellipsoid.  At  time  instant  t,  the  observation  y(t)  yields  a  set  St  which  is  a  degenerate 
ellipsoid  in  the  parameter  space,  namely 

S,  =  ( 0  s  RN  :  fy(i)  -  eTd)(t)]2  <  Y  2 }  (2.4) 

From  (2.1)  and  (2.2)  it  is  clear  that  St  contains  the  true  parameter  vector.  An  ellipsoid  Et 
which  contains  the  intersection  of  Et.j  and  St  is  then  given  by  [10] 

Ei  =  {  0e  RN  :(l-Xt)[0-0(t-l)FP-i(t-l)  [0-6(t-l)] 

+  X,[y(t)-0T<I)(t)l2  )  <(l_Xt)o2(t.i)  +  XiY2)  (2.5) 

where  Xt  is  a  positive  time*varying  updating  gain.  Note  that  (1-Xt)  can  be  regarded  as  a 
forgetting  factor.  The  formation  of  the  bounding  ellipsoid  Et  which  contains  the 
intersection  of  an  ellipsoid  Et.i  and  the  set  St  is  illustrated  by  means  of  a  two-dimensional 
example  in  Figure  1.  By  performing  some  algebraic  manipulations  on  (2.5),  an  expression 
for  El  can  be  obtained  as 

Et  =  {0  e  RN  ;  (9  -  e(t))T  P*‘(i)  [0  -  0(t)l  <  a2(t)  }  (2.6) 

where 


P*>(t)  =  (1-  Xt)P-'(t-l)  +  Xt  a)(t)cI>T(t) 

(1  ^  1  m  ^.(l-J^i)[y(t)-d>T'(t)0(t-i)l2 

o  (t)  -  (l-X,)  a2(M)+  Xi  -  liXt+Xt  OTcoPit-imTr 

0(0  =  0(t-l)  +  Xt  P(t)a)(t)ry(t)-d>T(t)0(t-l)] 

Using  the  matrix  inversion  lemma  in  (2.7)  yields 

Pf t)  =  -^1  Pf t  n  -  ^  )<^>(t)d>T(t)P(t- 1 ) 

ni)  ^  Xia>T(t)P(t- 1)0(1)  * 


(2.7) 

(2.8) 

(2.9) 

(2.10) 


Equations  (2.6)  -  (2.9)  characterize  the  update  of  the  bounding  ellipsoids.  The  center 
0(0  of  the  bounding  ellipsoid  Et  can  be  taken  to  be  a  point  estimate  of  the  parameter  vector. 
-Note  that  different  values  of  Xi\ield  different  bounding  ellipsoids  [10].  To  ensure 
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convergence,  need  be  chosen  to  optimize  in  .some  sense  the  ooiinding  ellipsoids  and. 
clearly,  different  optimization  criteria  would  lead  to  different  QBE  .ilgorithms. 

In  the  DHOBE  algorithm,  the  updating  gain  a,  is  chosen  to  minimize  o-(t)  at  every 
instant  r.  This  has  the  effect  of  usually  decreasing  the  size  of  the  ellipsoid  from  iteration  to 
iteration,  though  there  is  no  guarantee  that  the  size  will  be  minimi;.ed.  This  choice  of  Xt  has 
yielded  good  results  experimentally  and  in  addition  has  simplified  the  convergence  and 
tracking  analysis  of  the  algorithm.  The  minimization  procedure  yields  the  following 
updating  criterion  [6] 

If  a2(t-l)  +  52(t)  <  then  Xt  =  0  (i.e.,  no  update)  (2.11) 

where  5(t)  is  the  a  priori  prediction  error,  namely, 

5(t)  =  y(0-<I>'r(t)e(t-i)  (2.12) 

Otherwise  if  a2(t- 1 )  +  52(t)  >  then  the  optimum  value  of  X,  is  non-zero  and  can  be 
calculated  according  to 

X,  =  min(a,Vt) 


where 


a 

l-P(t) 


v,= 


l-G(t) 

a 


fl-^/ 


G(t) 


l+P(t)[G(t)-ll 


and  a  is  a  user  chosen  upper  bound  on  Xt  satisfying 

0<  a<  1 

G(t)  =  0'r(t)P(t-l)ct>(i) 


and 

and 


P(t)  = 


Y"-  q^(t-i) 
52(t) 


o 

II 

CO 

(2.13.a) 

ifG(t)=  1 

(2.13.b) 

if  l-t-p(t)[G(t)-ll>0 

(2.13.0 

if  l+p(t)[G(t)-ll  <0 

I2.13.d) 

(2.14) 

(2.15) 

(2.16) 

The  initial  conditions  are  chosen  to  ensure  that  9*  e  Eq.  A  possible  choice  is 

P(0)  =  I.  6(t)  =  0  and  0^(0)  =  l/e^  where  £  «  1. 

The  above  equations.  i2.8-2. 16)  define  the  recursions  of  the  DHOBE  algorithm.  In  [6], 

some  convergence  t\pe  properties  such  as  con\ergence  of  the  parameter  estimates  to  a  ball 

and  boundedness  of  the  prediction  error  have  been  shown  for  :ime-in\  ariant  systems.  In 
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fll]  and  , [12],  ah  extension- of  this  algorithm  was  deveioped  for  ARMA  parameter 
estimation  and  similar  convergence  propenies  have  been  shown  to  hoid. 


III.  ANALYSIS  OF  TRACKING  CHARACTERISTICS 
As  mentioned  earlier,  tracking  in  the  context  of  OBE  algorithms  for  parameter 
estimation  will  mean  ensuring  that  the  time  varying  true  parameter  vector  is  contained  in  the 
bounding  ellipsoid.  The  theorems  below  present  conditions  under  which  parameter 
tracking  can  be  accomplished. 

Theorem  1.  A  sufficient  condition  for  0*(t)  e  Ei  is 

(e*(t)-0(t-l))''P''(t-l)(e*(t)-0(t-l))<a-(t-l)  (3.1) 

Proof:  If  0*(t)  e  Et.i  then  since  0*(t)  e  St  and  Ei2  Ei.iPiSt,  it  follows  that  0*(t)  e  Ei. 
And  from  (2.3),  0*(t)  e  Et.i  is  equivalent  to  (3.1). 

Theorem  2.  At  any  time  instant  t,  the  true  parameter  0*(t)  e  Ei  if  and  only  if 

(0  *  (t)  -  0(t  - 1))""  P-'  (t  - 1)(0  *  (t)  -  0(t  -  D)  <  a- (t  - 1)  +  tVIy'  -  (0)  (3.2) 

1-4, 

where  v(t)  is  the  noise  term  in  (2.1). 

Proof:  Subtracting  0*(t)  from  both  sides  of  (2.9)  yields 

e(t)-0*(t)  =  0(t-l)-0*(t)  +  X,P(t)O(t)5(t)  (3.3) 


Define  the  following  quadratic  function  in  0*(t) 

V(t )  =  [0(t)  -  0  *  (t)r  P-*  (t)[0(t)  -  0  *  (t)] 

Using  (2.7)  and  (3.3)  it  is  straightforward  though  tedious  to  show  that 

V(t)  =  (1  -  X,)[0(t  - 1)  -  0  *  (t)f  F'(t  -  l)l0(t  - 1)  -  0  *  (t)l 

,X.v^(t)-A(lz^ 

'  (l-X,)  +  X,G(t) 


(3.4) 


Using  (2.8)  in  (3.4)  yields 

V(t)  -  a-(t)  =  (1  -  >.,)[0(t  - 1)  -  0  *  (t)fp-‘(t  -  l)[0(t  - 1)  -  0  *  (t)l 

+>^,(v*(t)-Y-)-(l-?c,)a-(t-l)  (3.5) 

Since  0*(t)  s  Et  if  and  only  if  V(t)  <  a2(t),  thus  (3.2)  is  obtained.  V  VV 
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it  is  easy  to  see  irom  Theofeni  :  that  if  -he  true  parameter  6'  :  ■  is  constant  for  ail  t.  then 
the  bounding  ellipsoids  obtained  oy  ;he  DHOBE  digorithm  encloses  at  all  time 
instants.  This  is  a  propeny  that  ail  well  devised  set-membership  estimation  algorithms 
should  have  when  applied  to  estimarion  of  time-invariant  parameters.  If.  on  the  other  hand, 
9''(t)  is  time-varying,  and  if  at  some  time  instant  tk,  0*ft)  is  found  to  be  out  of  the  bounding 
ellipsoid  Et,  it  must  not  have  been  included  in  En.  Theorem  2  then  demarcates  the  region 
in  which  0*(t)  can  migrate  without  loss  of  tracking.  This  region  is  shown  in  Figure  2  for  a 
two-dimensional  case.  This  theorem  also  shows  that  by  choosing  'f-  to  be  larger  than  the 
actual  bound,  say, '{-  on  v-(t),  it  is  possible  to  increase  the  tracking  capability  of  the 
algorithm.  The  next  theorem  gives  an  upper  bound  on  the  maximum  variation  in  the 
parameters  for  which  tracking  is  guaranteed. 


Theorem  3.  If  0*(t-l)  s  Et.i,  andA.t?iO,  then  0*(t)  s  Eiif 

IIA(t)ll<-, . . .  i  [■—  ^ [y'  -  y'~  (t)] 

A  .  rp-'rt-m  ^  i-/i.  rF‘(t-i)r^ 


1  _ 

+a-(t-l)  p-Va-(t-l)  j 


where 


A(t)  =  0*(t)-0*(M) 


(3.6) 

(3.7) 


and  Xmin  and  A^ax  denote,  respectively,  minimum  and  maximum  eigenvalues,  and  II. II 
denotes  the  usual  Euclidean  norm.  The  quantity  V-  is  the  actual  bound  on  v-(t)  and  the 
threshold  that  is  needed  for  evaluating  the  optimal  updating  gain  via  (2. 1 1)  and  (2. 16)  is 
chosen  to  be  larger  than 
Proof:  It  is  straightforward  to  show  that 

[0(t-l)-0*(t)|^p-'(t-l)10(t-n-0*(t)l 

=  V(t-i)^y(t)P-’(t-i)A(t)-2A‘'(t)P-Tt-ne(t-!)  (3.8) 


where  V(t)  has  been  defined  previously  and 

9(t-l)  =  0(t-l)-0’'(t-l) 

.Substituting  (3.8)  into  (3.5)  and  using  the  fact  that  v-(t,)  <  yield 
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It  can  be  seen  from  (3.6)  that  if  Xf  =  0,  then  the  difference  between  and  y-  can  not 
be  exploited  to  increase  the  tracking  capability  of  the  algorithm.  In  this  case,  9''(t)  e  Ei  if 
and  only  if  9’*‘(t)  s  Ei-i.  Thus  if  9*(t)  jumps  out  of  Ei-i,  and  no  updates  are  performed  at 
future  time  instants  t+i ,  then  9*(t+i)  €  Ei+i  =  Ei-i,  and  the  parameter  may  never  be 
tracked.  However,  it  can  be  argued  that  an  update  will  be  performed  in  a  finite  interval  of 
time.  This  is  shown  heuristically,  by  examining  the  expression  for  the  magnitude  of  the 
prediction  error 

l6(t)l  =  l(9*(t)-9(M)l'r(p(t)  +  v(t)l 

.A.ssume  that  no  updates  are  performed  for  a  large  interval  of  time  say,  Irom  time  instant  t  to 
time  instant  r  +  N].  From  (2. 1 1)  it  then  follows  that 
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-r  v(t+i)  I  <  [yi- -  j  =o.L...Ni. 

If  the  input  and  noise  sequences  are  sufficiently  nch.  then  the  regressor  vector  (h(t)  will 
span  the  parameter  space  in  all  directions  and  so  [0*(t+ij-0(t- l;l’’^<l)ft+i)  will  not  be 
arbitrarily  small  for  all  i  €  [0,  NiJ.  If  lv(t+i)l  is  close  to  its  true  upper  bound  y  for  some  i 
in  the  same  interval,  and  if  {v(t)}  is  sufficiently  uncorrelated  with  the  input  {u(t)},  then  the 
above  inequality  will  be  violated  and  an  update  will  be  performed.  It  is  also  clear  that  to 
ensure  that  an  update  is  performed  eventually  (i.e.,  violation  of  the  above  inequality),  the 
threshold  should  not  be  chosen  much  larger  than 

If  the  parameter  variation  is  such  that  (3.2)  is  violated  then  9*(t)  s  Et.  The  next 
theorem  shows  that  if  0*(t)  remains  fixed  after  it  jumps  out  of  Ei,  and  if  the  jump  is  not 
large  enough  to  cause  the  subsequent  ellipsoids  Et+i  to  vanish,  for  i  >  0,  then  the  DHOBE 
algorithm  guarantees  that  the  trae  parameter  will  be  tracked  (enclosed)  in  finite  time. 

Theorem  4.  Assume  that  the  parameter  variation  at  time  instant  t  causes  0*(t)  «  Et. 
Assume  further  that: 

(1)  After  this  variation ,  the  parameter  remains  constant  (i.e.,  the  jump  parameter  case). 

(2)  o2(t+i)  >  0,  for  all  i  >  0. 

(3)  The  algorithm  does  not  stop  updating. 

(4)  A  lower  bound  p  is  imposed  on  Xt  at  all  updating  instants. 

Then  there  exists  an  Ni  >  0,  which  depends  on  the  amount  of  parameter  variation  and  the 
actual  and  user  .set  noise  bounds,  such  that  0*(t)  e  Ei+n,. 

Proof:  Since  9*(t)  s  Ei  ,  define 

n  =  19(t)-0*(t)|p-kt)[0(t)-0=*=(t)l-a-(t)  >  0  (3.14) 

.Assumption  (1)  will  imply  that  A(t+  .Ni)  =  A(i+l)  =0  for  arbitrary  positive  .Nj. 
Substituting  in  (3.9).  and  iterating  from  t+.Ni  to  t+1  yields 

t-N, 

V(t  +  .N,)-a-(t  +  .N.)  =  q]^(l-X^)+  £q...*.v,ly'^i)-y-]  (3.15) 

:=t  +  l 
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where  is  defined  as 


ifi<t 

"  1  =»*i 

[X.  if  i  =  t 

Assumption  (.3) -will  ensure  that  some  of  the  At+„  i  >  0,  will  be  non-zero.  This  ensures  that 
the  first  term  on  the  right  hand  side  of  (3.15)  will  tend  to  zero.  Since  the  second  term  on 
the  right  hand  side  of  (3.15)  is  negative,  the  difference  V(t+Ni)  -  a-(t+Ni)  will  tend  to 
zero  as  Ni  increases.  Thus  there  exists  an  Nj  such  that 

V(t+Ni)- a2(t+Ni)<0  (3.16) 

Thereby  ensuring  that  0’'(t)  e  Ei+n,.  VVV 

IV.  A  RESCUE  PROCEDURE 

In  many  cases  when  the  parameter  jump  is  large,  or  if  the  ellipsoid  has  shrunk  to  a  very 
small  size,  the  intersection  of  Et.i  and  St  can  be  void.  This  situation  is  illustrated  in  Fig.  3. 
In  such  cases,  a2(t)  will  become  negative,  thus  indicating  that  a  bounding  ellipsoid  could 
not  be  constructed.  To  circumvent  such  a  failure  of  the  algorithm,  a  rescue  procedure  is 
proposed.  If  at  any  time  instant  r,  cHt)  becomes  negative,  then  a2(t-l)  is  increased  by  an 
appropriate  amount,  thereby  increasing  the  size  of  E(.i  so  that  the  intersection  of  Si  and  this 
enlarged  Ei.j  will  no  longer  be  void.  .As  such,  an  ellipsoid  E,  will  be  constructed. 
Alternatively,  y-  could  be  increased  to  permit  a  non-null  intersection.  However,  the  former 
procedure  is  preferable  because  it  causes  0(t)  to  migrate  towards  0‘(t),  thereby  reducing  the 
parameter  estimation  error.  The  rescue  procedure  is  similar  to  the  covariance  resetting 
technique  used  in  RLS  algorithms  to  cope  with  time  varying  systems.  However,  in  the 
RLS  case,  a  Jump  in  the  parameters  has  to  be  detected  by  some  other  means  before  the 
covariance  matrix  can  be  re.set  whereas  for  the  DHOBE  algorithm.  a-(t)  becoming  negative 
is  an  automatic  indicator  of  a  jump.  The  .imount  of  increase  in  a-(t-l)  required  to  make 
a^(t)  positive  in  such  a  case  is  now  calculated. 

Recall  that  the  optimal  updating  gain  Xi  is  the  one  which  minimizes  a-(t).  The 

minimum  occurs  either  at  a  stationary  point  of  a-It)  or  at  one  of  the  boundaries  Xi  =  0  and 
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}i{.=  a.  Since  u. IS  assumed  that  a  t'aiiure  occurs  wnen  g^<  t- 1  j  >  t)  and  a-'t)  <  0.  therefore 
an  update  has  to  occur  at  r  and  so  ai  ?=  0.  The  case  tnat  the  minimum  occurs  at  a  stationary 
point  which  is  stnctly  inside  the  interval  f0,al  and  the  case  that  the  minimum  occurs  at  Xt  = 
a  are  considered  separately. 

Case  1.  ^ =0  and  0<Vt<a 

dX, 

From  (2.13)  it  is  clear  that  this  case  occurs  if  and  only  if  1+  P(t)[G(t)-l]  >  0  and  Vt  <  a. 
Setting  the  derivative  of  a^fi)  in  (2.8)  to  zero  yields 

5-(0-f  ,  5-(t)  =  0 

1  ~  A. -f  A,.G(t)  ( 1  —  A.  T  X.G(t))" 

Substituting  a-(t-l)  from  above  into  (2.8)  yields 

o-(t)  +  — (4.1) 
(l-X,  +  X,G(t))“ 

Thus  a^(t)  is  negative  if  and  only  if 

l5(t)|>i:i^:iJid^y  (4.2) 

l-X, 

On  substituting  for  Xt  from  (2.13b)  and  (2.13c),  (4.2)  can  be  e,\pressed,  respectively,  as 

I5(t)i>— - - *  . . Y  ifG(t)5:l 

VG(t)[l  +  p(t)(G(t)- 1)1-1 


I5(t)l> 


l  +  p(t) 


ifG(t)=  1 


Using  the  definition  of  P(t)  from  (2.16)  in  (4.3)  and  manipulating  terms  yields  a  necessary 
and  sufficient  condition  for  a-(t)  to  be  negative  in  terms  of  o2(t-l) 


G(t)-1 


5'(t)  +  Y'[G(t)-ll-J 


=  K.  ifG(t)?=l 


a2(t-l)  <  5-(t)  +  Y-  -  2yI  6(t)  |  =  Kj  if  G(t)  =  1 
.\‘Ote  that  the  last  inequality  was  obtained  because  v’l  =  ( l-P(t))/2  <  1.  hence  1+  P(t)  >  0. 
Thus  if  the  calculated  value  of  o-(t)  is  negative,  the  rescue  procedure  w  ill  replace  a^(t-l) 
by  Kj-r  where  2  is  a  positi\e  constant,  thereby  increasing  the  size  of  E(.i.  The  optimum 
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updating, gain  will  then  be  recaiculatea  and  the  resulting  \aiue  Mil  be  used  to  calculate 
a-(t),  9(t)  and  P(t).  Our  simulation  studies  have  shown  that  using  a  value  of  I  =  1  yields 
satisfactory  results. 


Case  2.  =  a 

In  this  case,  from  (2.8),  a^(t)  is  negative  if  and  only  if 

5-(t)  >  [l-a  +  aG(t)J 


c~(t-l),  v- 


a 


Thus  a2(t)  is  negative  if  and  only  if 

a“(t-n  <  a| 


5'(t) 


y 


l-a 


=  Ko 


l-a-faG(t)  l-a 

In  this  case  a^ft-l)  would  be  replaced  by  K2  +  C  and  the  value  of  the  updating  gain  would 
be  recalculated  and  used  to  calculate  9(t)  and  P(t). 


V.  SIMULATION  EXAMPLES 

The  tracking  properries  of  the  DHOBE  algorithm  are  studied  for  an  ARX(1,1)  model 
y(t)  =  cjy(t-l)+  h\x(i)  +  v(t) 

The  nominal  values  for  the  parameters  were  a  =  -0.5  and  h  =  1.0.  The  noise 
sequence!  v(t))  and  the  input  sequence {u(t))  were  both  generated  by  a  p.seudo-random 
number  generator  with  a  uniform  distribution  in  1-1,1).  This  corresponds  to  a  signal-to- 
noise  ratio  (SNR)  of  0  dB.  For  the  DHOBE  algorithm,  we  chose  a  =  0.2,  y-  =  1.0,  and 
O“(0)  =  100.  In  all  the  examples  shown  here,  the  parameter  estimates  are  taken  to  be  the 
centers  of  the  optimal  bounding  ellipsoids.  The  parameters  were  varied  as  follows; 

Case  1.  Slow  variation  in  the  parameter  vector 

The  parameters  a  and  b  were  varied  by  ICc  for  every  10  samples,  starting  from  the  first 
sample,  and  the  output  data  { yit) }  were  generated  for  r  =  1 .2....  1000.  It  was  then  observed 
that  the  bounding  ellipsoids  created  by  the  DHOBE  algorithm  contain  the  tnie  parameter  at 
all  time  instants.  The  final  parameter  estimation  error  was  7.0xl0*-\  The  parameter 
estimates,  i.e..  the  centers  of  the  OBE.  are  plotted  against  the  true  parameters  in  Fig.  4. 

II 
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-From, the  figure  it  ib  clear  tnat  the  DHOBE  aigoritiim  tracks  quite  weil  >low  time  variations 
in  the  parameters. 

Case  2.  .Slow  variation  in  the  parameter  vector  from  t  =  .iOO 

The  parameters  a  and  h  were  varied  by  1%  for  every  10  samples,  staning  from  the 
500^*' sample.  The  final  parameter  estimation  error  was  3.0x10*^.  All  the  bounding 
ellipsoids  were  seen  to  contain  the  true  parameter.  The  parameter  estimates  are  plotted 
against  the  true  parameters  in  Fig.  5.  The  figure  shows  that  the  algorithm  can  track  slow 
time  variations  in  the  parameters  even  after  it  has  "converged". 

Case  3.  Jump  in  the  MA  parameter  at  r  =  500 

The  parameter  h  was  changed  by  100%  at  the  500^^  sample,  and  a  was  kept  constant  at 
its  nominal  value  at  all  times.  Several  runs  of  the  DHOBE  algorithm  were  performed  with 
different  input  and  noise  sequences.  It  was  observed  that  the  true  parameter  vector  was  out 
of  the  bounding  ellipsoid  at  f=500  and  would  be  recaptured  by  the  bounding  ellipsoid  after 
some  number  of  samples  (usually  less  than  50 )  thus  verifying  the  claims  made  in  Theorem 
4.  It  was  also  observed  that  the  jump  causes  the  resulting  bounding  ellipsoids  to  have 
smaller  sizes.  Intuitively,  a  jump  at  time  t  causes  the  set  Sj,  i>  r  ,  to  have  a  smaller 
inter.section  with  E,.i  and  .so  the  ellipsoid  which  bounds  the  intersection  is  also  smaller.  In 
one  particular  run,  the  parameter  was  recaptured  at  t  =  530  and  the  final  parameter 
e.stimation  error  at  i  =  1000  was  l.SxlO'"'.  The  parameter  estimates  (the  centers  of  the 
bounding  ellipsoids;  are  plotted  against  the  true  parameters  in  Fig.  6.  Figure  7  shows  the 
parameter  estimates  obtained  for  this  run  by  applying  the  RLS  algorithm  with  a  forgetting 
factor  ^(t)=  0.9  and  A(tj  =  0.99.  Obsewe  that  the  RLS  parameter  estimates  are  extremely 
jumpy  when  Ait)  =  0.9.  probably  because  the  forgetting  factor  is  not  large  enough  to 
average  out  the  noise.  Figure  8  shows  the  estimates  wiien  the  variable  forgetting  factor 
proposed  by  Forte.scue  and  Kershenbaum  113)  is  incorporated  into  the  RLS  algorithm. 
This  variable  forgetting  factor,  A(t),  is  a  function  of  the  prediction  error  and  is  given  by 
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A- value  of  a'  =  0.01  was  used  because  it  yields  steady  state  tracking  error  of  about  the 
same-magnitude  as  does  the  DHOBE  algorithm.  From  these  figures,  it  is  evident  that  the 
DHOBE  algorithm  can  track  jumps  in  the  parameters  at  least  as  well  as  the  e.xponentially 
weighted  RLS  algorithm. 

The  effect  of  varying  7^  was  also  studied.  A  value  of  =  2  was  taken.  In  this  case, 
the  true  parameter  did  not  jump  out  of  the  bounding  ellipsoid  at  t  =  500.  The  parameter 
estimates  are  identical  to  those  in  Fig.  6.  But  the  ellipsoids  are  larger,  as  expected. 

For  a  different  run.  i.e..  with  a  different  input  and  noise  sequence,  the  jump  at  r  =500, 
caused  o2(t)  to  become  negative.  The  rescue  procedure  was  then  used  and  yielded 
remarkable  results.  The.  true  parameter  was  captured  immediately  at  r  =  501.  The  final 
parameter  estimation  error  was  2.4xl0'^.  Figure  9  shows  that  the  parameters  are  tracked 
extremely  rapidly  in  this  case. 

Tracking  Performance  in  Gaussian  Noise 

It  is  well  known  that  least-squares  algorithms  are  optimal  in  the  constant  parameter  case 
for  Gaussian  distributed  noise.  It  is  thus  interesting  to  compare  the  tracking  abilities  of  the 
DHOBE  and  RLS  algorithms  in  Gaussian  noise.  The  same  ARX  model  was  used  with  the 
noise  sequence  v(t)  now  being  generated  as  zero-mean  white  Gaussian  noise  with  variance 
0.25,  which  corresponds  to  an  SNR  of  l.25dB.  To  satisfy  the  bounded  noise  assumption, 
v(t)  was  truncated  to  the  range  [-1,1 ),  resulting  in  a  slightly  larger  SNR.  The  parameter  b 
was  changed  by  100%  at  the  500‘h  sample,  and  a  was  kept  constant  at  its  nomanal  value  at 
all  times.  Several  runs  of  the  DHOBE  algorithm  were  performed  with  different  noise 
.sequences.  As  in  the  uniform  noise  case,  it  was  found  that  in  a  few  runs,  the  rescue 
procedure  was  activated,  consequently  causing  extremely  rapid  acquisition  of  the 
parameter.  In  most  of  the  runs,  the  true  parameter  was  acquired  by  the  bcaiiding  ellipsoid 
without  requiring  rescue.  The  acquisition  usually  happened  in  les^  than  twenty  samples 


after  the  change  occurred.  Figure  10  compares  the  tracking  performance  of  the  RLS 
algorithm  (with  /at)  =  0.9  and  Att )  =  0.99)  to  the  DHOBE  algonthm  for  a  nin  in  which  the 
rescue  procedure  was  not  activated.  The  curves  shown  are  plots  of  estimates  of  parameter 
h  by  both  algorithms.  It  is  .seen  that  RLS  with  /at)  =  0.9  seems  to  track  a  little  faster  than 
the  DHOBE  algorithm.  However  the  steady  state  RLS  estimates  are  e.xtremely  jerky.  The 
tracking  performance  of  RLS  with  /at)  =  0.99  is  definitely  inferior  to  that  of  the  DHOBE 
algorithm,  however  its  steady  state  performance  prior  to  the  Jump  is  superior.  Another 
point  of  note  is  that  the  DHOBE  estimates  become  much  less  jerky  after  the  jump  on 
account  of  the  decrease  in  the  size  of  the  ellipsoids. 

VI.  CONCLUSION 

The  tracking  properties  of  a  recursive  set-membership  parameter  estimation  algorithm 
viz.  the  DHOBE  algorithm  have  been  investigated.  Some  sufficient  and  other  necessary 
conditions  which  ensure  parameter  tracking  have  been  derived.  A  modification  of  the 
DHOBE  algorithm  is  proposed  to  improve  its  tracking  capability  for  larger  parameter 
vtiriations.  Simulation  results  show  that  the  tracking  performance  of  the  DHOBE  algorithm 
is  comparable  to  that  of  the  exponentially  weighted  RLS  algorithm.  In  some  cases  of  large 
piirameter  jumps,  the  automatic  activation  of  a  rescue  procedure  causes  the  parameters  to  be 
tracked  extremely  rapidly. 
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CHAPTER  1 


INTRODUCTION 


1.1  Formulation  of  the  Adaptive  Filtering  Problem 

An  adaptive  filter  is  one  which  can  adjust  its  impulse  response  through  time  in  order 
to  reacETsome  desired  level  of  performance.  The  means  of  adjusting  the  adaptive  filter  is 
accomplished  through  an  adaptive  algorithm.  This  type  of  filtering  is  especially  needed 
when  dealing  with  unknown  and/or  changing  environments.  Many  useful  applications 
have  been  found  for  the  adaptive  filter,  such  as  noise  cancellation,  echo  cancellation  in 
phone  lines,  equalization  of  a  communication  channel  to  combat  intersymbol  interference, 
and  system  identification.  Detailed  descriptions  of  these  and  other  applications  of  adaptive 
filters  can  be  found  in  [Ha861,  [Wi75],  lHo84],  and  (Qu85]. 

The  adaptive  filtering  problem  will  be  approached  here  in  the  context  of  system 
identification.  This  is  an  imponant  area  of  study  for  adaptive  systems,  since  many 
applications  of  adaptive  filters  can  be  put  in  this  context.  In  system  identification,  it  is 
desired  to  characterize  a  system,  usually  called  the  plant  (see  Figure  1.1),  with  an  adaptive 
filter,  based  only  on  the  observable  input/output  data  sequences,  x(n)  and  y(n)  (see  Figure 
1.2).  The  plant/adaprive  filter  combination  will  be  referred  to  as  the  adaptive  system. 
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1.2  Notations 

The  systems  of  interest  in  this  presentation  for  the  plant  and  adaptive  filter  arc  those 
which  can  be  represented  by  linear  constant-coefficient  difference  equations.  In  the 
literature,  there  are  three  popular  notations  which  are  used  to  express  this  difference 
equation  input/output  relationship.  The  one  that  is  used  depends  largely  on  ease  of 
explanation  and  the  type  of  results  that  are  needed.  However,  the  varying  notations  can 
also  be  a  source  of  confusion.  It  is  attempted  here  to  explain  these  three  notations  in  order 
to  avoid  future  misunderstanding  of  their  meaning.  The  following  notational  examples 
describe  a  system  with  input  x(n)  yielding  an  output  y(n): 

1)  Difference,  equation  notation. 

This  is  the  standard  notation  found  in  most  texts  dealing  with  digital  signal 
processing  [Op75,Ch.l]: 

y  (n)  =  -  J^aiy  (n-i)  +  ]^bix(n-i)  (1.1) 

i-l  i=0 

The  minus  sign  here  is  arbitrarily  chosen  to  be  consistent  with  the  operator  notation,  shown 
next. 


2)  Operator  notation. 

The  operator  q"‘,  is  chosen  to  represent  a  delay  in  its  operand  signal  of  i  samples, 
i.e.,  q“‘x(n)=x(n-i).  This  is  analogous  to  the  z-transform  representation  of  a  delayed 
signal.  It  is  now  possible  to  represent  (1,1)  in  operator  notation  by  defining  the  following 
polynomials: 
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Figure  1.1  Theplant 


Figure  1.2  The  adaptive  filter 

Referring  to  Figure  1.1  and  Figure  1.2,  it  is  seen  that  two  structures  must  be 
decided  upon  in  the  system  identification  problem,  yielding  a  two-step  modelling  p’.iocess: 


1)  The  model  of  the  plant  structure.  This  decision  is  based  on  some 
knowledge  of  how  the  output  signal,  y(n),  is  generated  from  the  input 
signal,  x(n), 

2)  The  adaptive  filter  structure.  Thi-  decision  is  based  on  practical  resuictions 
on  the  complexity  of  the  adaptive  filter  and/or  its  corresponding  adaptive 
algorithm. 


In  Chapter  2,  an  assumption  which  is  common  in  the  field  of  system  identification 
will  be  made  on  the  plant  structure  and  two  popular  choices  for  the  adaptive  filter  will  be 
investigated.  These  two  structures  of  adaptive  systems  are  seen  in  Chapter  3  to  yield  two 
families  of  adaptive  algorithms. 
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A(q-l)  =  i  +  aiQ'i  +  ap^q'^a 

B(q“0  “  *^0  +  +  boq"^  +  •  •  •  +  bn^^q'^b 

An  equivalent  expression  to  (1.1)  is  thus: 


A(q-0y(")  =  B(q-l)x(n) 


Note  y(n)  can  be  solved  for,  thus  yielding: 
B(q-l) 


(1.2) 


It  is  important  to  note  here  that  the  operator  polynomial  appearing  as  a  denominator  term  in 
(1.2)  implies  the  existence  of  an  autoregressive  component  in  the  determination  of  the 
signal  y(n).  In  other  words,  y(n)  depends  on  past  values  ("regressive")  of  itself  ("auto")  in 
addition  to  the  current  and  past  values  of  the  input. 


It  may  appear  as  if  there  is  a  mixing  of  frequency  domain  and  time  domain  notations 
in  (1.2).  However,  this  is  not  the  case,  since  the  delay  operator  was  not  defined  as  a 
complex  transform  variable  as  in  z-transforms.  In  interpreting  (1.2),  it  is  helpful  at  first  to 
mentally  multiply  the  expression  through  by  the  denominator  polynomial,  A(q“l).  Since 
A(q“l)  begins  with  a  "1,"  the  first  term  is  y(n)  and  all  the  other  terms  are  autoregressive, 
and  can  be  moved  to  the  right  of  the  equation,  yielding  the  explicit  expression  for  y(n)  of 
(1.1). 


An  example  of  operator  notation  which  will  be  seen  often  is  a  pure  autoregressive 
filtering  of  a  signal.  This  operation,  applied  to  a  signal  y(n),  appears  in  operator  notation 
as: 


y'(n)  = 


A(q-l) 


y(n) 
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where  the  prime  (,')  character  is  used  here  to  denote  the  autoregressively  filtered  version  of 
the  unprimed  signal.  Expanding  this  notation  as  described  in  the  previous  paragraph  yields 
the  explicit  difference  equation  relationship: 

A(q-0y'(n)  =  y(n) 

"a 

y'(n)  =  y(n)  -  ^^ajyXn-i) 
i=l 

3 )  Matrix  notation. 

—Another  convenient  means  of  expressing  (1.1)  is  through  matrix  operations.  Define 
the  parameter  vector,  9,  as: 

9=  [ai  a2  •••  an^  bo  bj  bnj^ 

Also  define  the  regressor  vector,  (p(n),  as: 

T 

(p(n)  =  [-y(n-l)  -y(n-2)  •••  -y(n-na)  x(n)  x(n-l)  •  •  •  xCn-n^)] 

These  definitions  lead  to  the  following  equivalent  expression  for  (1.1): 

y(n)  =  9T{p(n) 

1.3  Difference  Equation  Structures 

There  are  five  difference  equation  structures  that  are  the  most  commonly 
encountered  and  dealt  with  in  the  literature.  In  the  following,  they  are  presented  in  terms  of 
the  plant  structure  of  Figure  1.1.  Note  that  an  unobservable,  zero  mean  white  noise 
component,  v(n),  is  present  in  all  the  cases,  since  an  approximation  is  generally  acceptable 
if  it  is  correct  up  to  some  random,  independent,  zero  mean  amount,  llte  corresponding 
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adaptive  filter  structures  are  obtained  by  adding  a  caret  ('")  on  top  of  the  plant  quantities 
y(n),  aj,  bj,  c,,  n^,  n(j,  n^'^,  and  providing  an  estimate  of  the  terms  involving  the 
unobservable  signal,  v(n).  When  v(n)  appears  alone  as  an  additive  modelling  error  of  the 
plant,  its  estimate  is  the  expected  value,  which  is  zero.  In  other  words,  the  v(n)  term  is 
simply  dropped  in  these  cases  (which  are  shown  as  structures  1)  -  3)  below).  The  five 
structures,  in  order  of  increasing  complexity,  shown  in  both  difference  equation  and 
operator  notation,  are: 

1)  Exogenous  (X) 

"b 

y(n)  =  ^bix(n-i)  +  v(n) 
i=0 

y(n)  =  B(q-l)x(n)  +  v(n) 

2)  Autoregressive  (AR) 

"a 

y(n)  =  -2^aiy(n-i)  +  v(n) 
i=i 

A(q-l)y(n)  =  v(n) 

3)  Autoregressive,  exogenous  input  (ARX) 

y(n)  =  -^ajyCn-i)  +  J^bix(n-i)  +  v(n) 
i=l  1=0 

A(q-l)y(n)  =  B(q-l)x(n)  +  v(n) 

4)  Autoregressive,  moving  average  (ARMA) 

y(n)  =  -2^aiy(n-i)  +  £civ(n-i) 
i=l  i=0 


~  Tlie  caret  used  in  this  manner  denotes  a  quanuty  which  is  an  estimate,  in  some  sense,  of  the 
corresponding  "uncareted"  quantity. 
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q-Oy(n)  =  C(q-l)v(n) 

5)  Autoregressive,  moving  average,  exogenous  input  (ARMAX) 

iij 

y(n)  =  -]£aiy(n-i)  +  £bix(n-i)+  ]^CiV(n-i) 
i=l  i=0  i=0 

A(q“0y(*^)  =  B(q-l)x(n)  +  C(q-l)v(n) 

The  ARMAX  model  is  the  most  general  model  which  will  be  considered  here,  as  it 
contains  all  previously  mentioned  models  1)  -  4)  as  special  cases.  Examples  of  other  more 
general  approaches  to  modelling  are  given  in  [Lj83]  and  [Ab88].  Specifically,  the  Box- 
Jenkins  model  is  discussed  in  [Lj83],  which  extends  the  ARMAX  model  by  replacing  the 
A(q-l),  B(q-i),  and  C(q-i)  polynomials  with  rational  functions  of  polynomials.  In 
[Ab88],  the  plant  is  modelled  as  a  linear,  continuous-time,  time-varying  process  with  no 
constraints  on  the  order  of  the  system.  It  is  shown  there  how  this  very  general  model 
yields  a  nonlmear  adaptive  filter. 

The  extent  to  which  a  given  plant  can  be  identified  with  an  adaptive  filter  will 
depend  of  course  on  how  well  the  chosen  plant  structure  approximates  the  true  physical 
system,  and  also  on  which  structure  is  chosen  for  the  adaptive  filter,  as  will  be  seen  in 
Chapter  2.  This  two-step  modelling  process  is  crucial  to  the  success  of  any  adaptive 
filtering  problem. 

1.4  Applications  of  Difference  Equation  Structures 

It  is  helpful  to  see  how  the  difference  equation  structures  presented  above  are 
utilized  by  considering  two  examples  of  their  use  in  practical  situations:  Linear  predictive 
coding  and  echo  cancellation. 


78 


8 


1.4.1  Linear  Predictive  Coding  (LPC)  Speech  Modelling 

An  important  modelling  example  in  adaptive  filtering  is  the  charactenzation  of  the 
vocal  process  for  the  reproduction  of  speech.  The  LPC  technique  for  speech  modelling  is  a 
"black  box"  method,  which  assumes  a  known  input  of  either  an  impulse  train  (for  voiced 
sounds)  or  white  noise  (for  unvoiced  sounds)  applied  to  an  unknown  time-varying  system 
whose  output  is  the  final  voice  signal  (See  Figure  1.3).  The  unknown  system  corresponds 
to  the  "plant"  in  Figure  1.1  and  the  two-step  modelling  process  of  Section  1.1  must  be 
applied  in  order  to  characterize  this  speech-producing  system  with  an  adaptive  filter.  When 
this  is  accomplished,  speech  sounds  can  be  reproduced  by  exciting  a  system  with  the 
appropriate  input  signal  having  the  same  characteristics  as  the  vocal  tract  plant.  Since  these 
characteristics  are  time-varying,  the  adaptive  filter  is  especially  suited  to  this  application. 


Figure  1.3  A  model  of  speech  production 


Following  the  two-step  modelling  process,  it  has  been  seen  experimentally  that  an 
AR  model  for  the  vocal  tract  is  a  good  choice  for  the  plant  strucnire  of  the  adaptive  system 
(See  Figure  1.3).  As  for  the  adaptive  filter  suucture,  note  that  the  input  is  not  accessible  to 
the  adaptive  filter.  However,  the  input  is  assumed  to  be  either  white  noise  or  an  impulse 
train.  Therefore,  if  an  adaptive  filter  could  reproduce  the  input  given  only  the  output  voice 
signal,  it  would  characterize  the  inverse  of  the  vocal  tract  plant,  and  thus  characterize  the 
vocal  tract  itself.  It  can  be  seen  that  an  adaptive  filter  with  an  X  stniciure  having  input  y(n) 
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and  output  e(n)  (See  Figure  1.4)  can  accomplish  this  inversion  when  the  adaptive  filter 
coefficients  are  adjusted  such  that  bpa,,  i=0,  •  •  • ,  hb=na  (ao=l).  It  is  shown  in  [Ha86] 
that  choosing  bj  parameters  such  that  the  mean-square  value  of  the  adaptive  filter  output, 
e(n),  is  minimized  will  yield  parameters  such  that  bi=aj. 

y(n)  ^ 

Adaptive  Filter 

Figure  1.4  X  structure  for  adaptive  filter 


1.4.2-£cho  Cancellation 

In  telephone  communication,  a  problem  arises  when  the  signal,  x(n),  which  is 
transmitted  over  long  distances  via  a  "four-wire"  line,  reaches  the  "two-wire"  line  of  the 
destination  phone.  An  echo,  y(n),  is  generated  at  the  meeting  of  the  two  transmission  lines 
(the  hybrid),  due  to  an  impedance  mismatch.  This  echo  subsequently  travels  back  to  the 
source,  which  the  speaker  hears  (see  Figure  1.5). 

Four-wire  line  x(n) 


1 


Hybrid 

Two-wire  line 

^  Four-wire  line 

y 

(n) 

Figure  1.5  Echo  generation  in  phone  lines 

If  the  hybrid  characteristics  were  known,  a  fixed  filter  could  be  placed  in  parallel 
with  the  hybrid.  The  filter  output  would  then  be  subtracted  from  the  hybrid  ouqjut,  y(n), 
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thus  cancelling  the  echo.  Since  the  hybrid  characteristics  are  not  known  and  may  be  time 
varying  as  well,  a  fixed  filter  will  not  solve  the  problem.  However,  an  adaptive  filter  in 
this  configuration  producing  an  output,  y(n),  has  been  shown  to  accomplish  very  well  the 
task  of  echo  cancellation  when  the  adaptive  filter  coefficients  are  adjusted  such  that  the 
mean-square  value  of  the  error  signal,  y(n>-y(n),  is  minimized.  This  is  illustrated  in  Figure 
1.6,  where  H(q-l)  and  H(q-l)  stand  for  rational  functions  of  the  operator  polynomials 
introduced  in  Section  1.2,  analogous  to  the  transfer  function  representation  for  linear,  time- 
invariant  systems. 


x(n) 


Figure  1.6  Model  of  hybrid  and  echo  canceller 


At  this  point,  again,  models  must  be  decided  upon  for  both  the  hybrid  "plant"  and 
the  adaptive  filter,  through  specific  choices  for  H(q"^)  and  H(q“^).  The  hybrid  has  been 
modelled  in  different  ways,  giving  rise  to  various  adaptive  filter  structures.  The  simplest 
model  of  the  hybrid  is  to  consider  it  as  an  X  process  [Ha86],  [Ho84].  In  other  words,  y(n) 
depends  only  on  a  weighted  sum  of  past  inputs.  Since  the  input  is  available  for  use  by  the 
adaptive  filter,  the  X  structure  should  be  adapted  so  that  bi=bi,  i=0,  •  •  •  ,115=  n^,  and  thus 
the  error  signal  will  be  only  white  noise.  This  plant/adaptive  filter  combination 
corresponds  to  the  choices: 
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H(q-0=B(q-l) 

H(q-0=B(q-l) 

where  B(q-l)  and  B(q-l)  are  defined  as  in  Section  1.2. 

A  slightly  more  complicated  structure  for  the  hybrid  models  it  as  an  ARX  process, 
which  computes  its  ouqnit  as  a  weighted  sum  of  both  the  incoming  signal,  x(n-i),  i=0, 

•  •  • ,  njj,  as  well  as  its  ouqjut  y(n-i),  i=l,  •  •  •  ,na.  This  is  a  more  realistic  system,  as  this 
recursive  structure  contains  poles  as  well  as  zeros.  Note  that  the  adaptive  filter  in  this  case 
can  still  be  chosen  with  an  X  structure,  because  both  signals  x(n)  and  y(n)  are  available  to 
use  as  inputs.  When  the  adaptive  filter  weights  which  multiply  x(n-i),  i=0,  •  •  •  ,nb,  are 
equal  to  the  corresponding  bj,  and  those  which  multiply  y(n-i),  i=l,  •  •  •  ,na,  are  equal  to 
aj,  the  error  signal  will  again  be  white  noise. 

Finally,  a  model  [Fa88]  which  is  more  realistic  and  which  will  be  considered  in 
some  detail  later,  is  to  represent  the  hybrid  as  being  an  ARX  process  with  no  internal  noise 
term  as  in  the  ARX  process  above,  but  whose  output,  p(n),  is  corrupted  by  white 
measurement  noise,  v(n).  It  will  be  shown  in  Chapter  2  that  this  model  of  the  hybrid  is 
actually  an  ARMAX  process  with  Ci=ai,  i=l,  •  •  • ,  n^.  It  will  be  further  be  shown  that  the 
proper  structure  for  the  adaptive  filter  to  cancel  the  echo  is  the  ARX  structure.  In  other 
words  the  adaptive  filter  must  be  recursive  (HR)  in  order  to  cancel  the  echo,  y(n),  produced 
at  the  hybrid.  Referring  again  to  Figure  1.6,  this  plant  and  adaptive  filter  are  characterized 
by: 

0= 

-■)  = 


B(q-‘) 

A(q-l) 

B(q-‘) 

A(q-.) 
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Note  that  if  v(n)H0,  the  situation  reduces  to  the  ARX  plant  described  in  the  preceding 
paragraph.  Thus  the  simpler  X  structure  adapdve  filter  with  inputs  x(n)  and  y(n)=p(n)  can 
be  used.  The  system  structure  for  this  situation  is  slightly  different  than  the  one  shown  in 
Figure  1.6,  and  is  shown  in  Figure  1.7.  In  this  case  two  FIR  filters  are  used  -  one  which 
realizes  the  zeros,  and  one  which  realizes  the  poles  of  H(q-l).  It  is  imponant  to  see  here 
how,  when  A(q“l)  =  A(q“^)  and  B(q-l)  =  B(q“l),  the  signal  e(n)  is  zero. 


x(n) 


Figure  1.7  Echo  cancellation  for  a  recursive  plant  when  v(n)^ 


1.5  The  Mean  Square  Error  Criterion 

In  order  for  an  adaptive  algorithm  to  adjust  the  impulse  response  of  its  adaptive 
filter,  the  algorithm  must  somehow  be  able  to  gauge  its  progress  to  determine  how  to  make 
the  adjustment.  A  natural  criterion  on  which  to  base  this  adjustment  is  the  difference 
between  the  output  of  the  plant,  y(n),  and  that  of  the  adaptive  filter,  y(n).  Thus  the  error 
signal  is  defined  as  e(n)=y(n)-y(n)+ .  Intuitively,  the  magnitude  of  the  error  signal  should 

"  The  notation  e(n)  will  be  used  to  refer  to  the  error  signal  of  any  adaptive  system.  In  Chapter  3,  error 
quanuties  for  specdic  adapuve  systems  will  be  defuned  as  ee(n)  and  oe(n).  The  cause  of  the  difference 

between  ee(n)  and  oe(n)  will  be  seen  to  be  the  mv  ^ler  in  which  the  adaptive  filter  output,  y(n),  is  generated. 
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be  as  small  as  possible  for  desired  operation.  Mathematically,  however,  the  criterion 
ly(n)-y(n)l  isn't  very  attractive.  mathematically  sounder  criterion  leading  to  efficient 
adaptive  algorithms  is  the  mean  -  square  error  (MSE)  which  is  expressed  as  E  {  e2(n) } . 
The  squaring  operation  provides  an  alternative  to  the  absolute  value  operation.  Statistical 
expectation  is  iicededi  because,  as  noted  previously,  the  plant  models  are  all  assumed  to  be 
accurate  to  within  an  independent,  zero-mean  noise  term,  v(n). 

The  squaring  operation  can  also  be  viewed  as  providing  a  criterion  which  tends  to 
emphasize  larger  values  of  the  error  signal  while  diminishing  the  imponance  of  smaller 
errors,  as  opposed  to  the  absolute  value  operation,  which  linearly  assigns  an  error  penalty 
according  to  the  magnitude  of  the  error,  le(n)l.  This  is  an  intuitively  reasonable 
characteristic  for  a  criterion  of  "goodness"  to  have.  However,  this  also  causes  some 
algorithms  to  adapt  more  slowly  as  the  MSE  of  the  adaptive  filter  decreases.  Depending  on 
the  goal  of  the  adaptive  system,  this  may  or  may  not  be  of  significance. 

1,6  Overview 

In  what  follows  in  this  thesis,  some  basics  in  the  area  of  adaptive  systems  from  the 
perspective  of  system  identification  will  be  developed,  as  well  as  experimental  results 
obtained  by  the  author.  In  particular.  Chapter  2  introduces  two  important  classes  of  system 
identification  models:  The  equation  error  and  output  error  adaptive  systems.  Chapter  3 
presents  methods  of  adjusting  an  adaptive  filter  (i.e.  adaptive  algorithms)  in  the  context  of 
both  the  equation  error  and  output  error  adaptive  systems.  This  will  be  seen  to  give  rise  to 
two  different  families  of  adaptive  algorithms.  In  Chapter  4,  an  adaptive  algorithm  is 
presented  which  combines  elements  from  two  adaptive  schemes  studied  in  Chapter  3. 
Simulation  results  are  given  which  show  empirically  that  the  algorithm  works,  and 
comparisons  are  made  with  the  standard  method.  Finally,  two  output  error  algorithms 
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developed  by  the  author  are  presented  for  review.  Tliese  are  preliminary  results  and  no 
simulations  have  been  performed  yet. 
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CHAPTER  2 


CONCEPTS  OF  ADAPTIVE  FILTERING 


2.1  Modelling  Techniques 


In  system  identification  the  plant  structure  is  usually  modelled  as  a  rational  transfer 
function  whose  output,  p(n),  is  corrupted  by  additive  white  measurement  noise,  v(n),  to 
yield  the  observable  signal,  y(n)  (see  Figure  2.1).  This  plant  is  a  special  case  of  the 
ARMAX  structure,  which  can  be  seen  by  taking  the  expression  for  the  output: 


B(q-0 


and  multiplying  through  by  the  A(q“0  polynomial  to  yield: 


A(q"0y(n)  =  B(q-0x(n)  +  A(q-0v(n) 


Or,  equivalently,  using  difference  equation  notation: 

^  ^ 

y(n)  =  -XaiyCn-i)  +  X^^ivCn-i)  (2.1) 

i=l  i=0  i=0 


Note  in  the  last  summation  above,  ao=l.  It  is  thus  seen  that  the  plant  structure  of  Figure 
2.1  is  a  special  case  of  the  ARMAX  structure  with  c(q~0=A(q“l) . 
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Figure  2.1  An  ARMAX  plant  with  C(q“l)  =  A(q“l) 

Given  this  plant,  there  are  two  approaches  that  can  be  taken  in  modelling  this  plant 
with  an  adaptive  filter — the  equation  error  approach  and  the  output  error  approach. 

2.1.1  Equation  Error  Model 

Since  x(n)  and  y(n)  are  both  measurable,  the  simplest  approach  is  to  use  these 
signals  to  form  the  output  of  the  adaptive  filter  similar  to  how  the  plant  forms  the  output  in 
(2.1).  This  approach  yields  the  expression  for  the  adaptive  filter  output: 

^  flj  ^ 

y(n)  =  -£ai(n-l)y(n-i)  +  ]£bi(n-l)x(n-i)  (2.2) 

i=l  i=0 

Since  the  v(n)  terms  of  the  plant  can  not  be  measured,  they  will  be  neglected.  Note  that  the 
last  available  values  of  the  adaptive  filter  parameter  estimates,  ai(n-l)  ana  bi(n-l),  are  used 
to  determine  the  adaptive  filter  output.  An  adaptive  algorithm  uses  the  error  signal,  y(n)- 
y(n),  to  determine  the  new  "current"  estimates,  aj(n)  and  bi(n). 

The  expression  (2.2)  can  be  expressed  in  matrix  notation  as: 

y(n)  =  0(n-l)T(pee(n)  (2.3) 

where  the  regressor  vector,  (Pee(n),  is  defined  as  follows: 

T 

(pee(n)  =  [-y(n-l)  •  •  •  -y(n-na)  x(n)  •  •  •  x(n-nb)] 

Using  operator  notation,  the  description  of  the  adaptive  system  is: 
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y(n)  =  [l-A(q-l)]y(n)  +  B(q-l)x(n)  +  A(q'"l)v(n) 
y(n)  =  [l-A(q-l,n-l)]y(n)  +  B(q~l,n-l)x(n) 
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(2.4) 

(2.5) 

The  appropriate  structures  of  the  equation  error  modeUresulting  from  the  two-step 
modelling  process  describe^  in  Chapter  1,  can  be  recognized  from  (2.1)  and  (2.2)  for  the 
plant  and  adaptive  filter  as  ARMAX  and  X,  respectively. 

This  adaptive  system  generates  an  error  signal  y(n)  -  y(n)  known  as  the  equation 
error,  ee(n).  Subtracting  (2.5)  frovn  (2.4)  yields; 

ee(n)  =  y(n)  -  y(n) 

=  -[A(q-l)-A(q-l,n-l)]y(n)  + 

[B(q-l)-B(q~l,n-l)]x(n)  +  A(q-0v(n)  (2.6) 

Re-expressing  (2.6)  utilizing  matrix  notation  yields  the  following  useful  relationship 
between  ee(n)  and  the  parameter  error  vector,  0  =  9p-0(n-l): 

ee(n)  =  e(n-l)^'(pee(n)  +  A(q-l)v(n)  (2.7) 

An  alternative  e.xpression  for  ee(n)  can  be  obtained  fiom  (2.6)  by  noting  from  (2.1)  that 
-A(q“l)y(n)+B(q-l)x(n)+A(q“^)v(n)=0.  This  yields  the  following  expression  for  the 
equation  error: 

ee(n)  =  A(q-l,n-l)y(n)  -  B(q-l,n-l)x(n)  (2.8) 

Equation  (2.8)  implies  the  series-parallel  structure  of  Figure  2.2  for  the  equation  error 
model  of  an  adaptive  system.  Note  that  r'tis  structure  requires  only  FIR  filters  for  its 
implementation. 


B(q-l) 

—4 

A(q-1) 

Figure  2.2  Series-parallel  strucmrc  of  the  equation  error  adaptive  system 

2.1.2  Output  Error  Model 

« 

The  output  error  model  adaptive  filter  attempts  to  duplicate  the  structure  of  the 
assumed  plant  model.  This  adaptive  system  can  most  easily  be  introduced  through  a 
diagram  of  its  structure,  shown  in  Figure  2.3. 


B(q-0 


A(q-1) 


B(q-^n-l) 


Figure  2.3  Parallel  structure  of  the  output  error  adapive  system 
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A.  few  comments  .are  in  order  here  regarding  this  alternative  system  identification 
structure.  Note  parallel  structure  of  the  output  error  model,  in  contrast  to  the  series- 
parallel  equation  error  model  structure.  This  implies  that  the  adaptive  filter  is  independent 
of  the  plant,  sharing  only  the  common  input  signal.  An  irnponant  consequence  of  this 
characteristip.is  that  the  noise,  v(n)i  is  not  introduced  in  the  adaptive  filter,  as  it  is  in  the 
equation  error  model  through  y(n)=p(n)+v(n)  (see  Figure  2.2).  Thus  it  might  be 
reasonable  to  expect  that  the  measurement  noise,  v(n),  will  have  less  of  an  effect  on  the 
performance  of  the.  output  error  model  than  the  equation  error  model.  This  is  in  fact  true, 
as  will  be  shown  shortly.  It  is  also  important  to  note  that  the  adaptive  filter  in  the  output 
error  model  is  an  HR  filter.  In  other  words,  the  autoregressive  portion  of  the  adaptive  filter 
uses  past  values  of  y(n-i),  1=1,*  •  •  .fia,  in  determining  its  current  output  y(n).  This  is  in 
contrast  to  the  equation  error  model,  which  uses  past  values  of  the  plant  outpoj ,  y(n-i), 
i®l.  •  •  ♦  .fia,  in  determining  y(n).  The  output  of  the  adaptive  filter  can  be  expressed 
compactly  in  matrix  notation  similar  to  that  of  (2.3)  for  the  equation  error  adaptive  filter: 

y(n)  =  0(n— l)f(poe(n) 

where 

<Poe(")  =  [— -y(n-2)  •  •  •  -y(n-na)  x(n)  x(n-l)  •  •  •  .^(n-nb)]  ^  (2.9) 

The  difierenceof  ihe  regressors  in  (2.3)  and  (2.9)  captures  very  concisely  the  fundamental 
difference  between  the  equation  error  and  the  output  error  methods. 

Referring  lo  Figure  2.3,  the  output  error  adaptive  system  is  seen  to  be  described  by 
the  following  equations; 


A(q-l)y(n)  =  B  (q-l)x(n)  +  A(q-l)v(n) 

(2.10) 

A(q-l.n-l)y(n)  =B(q'-i,n-l)x(n) 

(2.11) 
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the  two-step' modelling  jjrocess-described  in  Chapter  i  has  thus  yielded  the  saoictures  of 
ARMAX  and  for  .the  plant  and  ad.»pdve  filief,  respectively. 

The  expression  for  the  error  signal,  y(n)-y(n),  can  now  be  derived.  This  signal  is 
known  as  ;he  output  error,  oe(n).  Subtracting  (2.11)  from  (2.10)  yields: 

A(q-l)y(n)  -  A(q-l,n-l)y(n) 

=  [B(q-l)-B(q-l,n-:l)]x(n)  +  A(q-l)v(n)  (2.12) 

In  order  to  get  an  expression  for  the  output  error,  oe(n)  =  y(n)-y(n),  either  A(q“')y(n)  or 
A(q-l,n-l)y(n)  can  be  added  and  then  subtracted  to  the  left  side  of  (2.12).  This  results  in 
two  different  interpretations  of  oe(n).  Choosing  A(q“l,n-l)y(n)  yields: 

A(q-l)y(n)  -  A(q-l,n-l)y(n)  +  A(q-l,n-l)y(n)  -  A(q-l,n~l)y(n) 

=  [B(q-l)-B(q“l,n-l)]x(n)  +  A(q“I)v(n) 

Factoring  oe(n)  =  y(n)  -  y(n)  gives: 

A(q-i,n-l)oe(n)  +  [A(q-^)-A(q-l,n-l)]y(n) 

=  [B(q-l)-B(q-l,n-l)]x(n)  +  A(q-l)v(n) 

Solving  for  oe(n): 

A(q-l,n-l)oe(n)  = 

{-[A(q-')-A(q-l,n-l)]y(n)+[B(q-l)-B(q-l,n-l)]x(n)+A(q-l)v(n)} 

The  term  in  braces  can  be  recognized  as  the  equation  error,  eetn).  Thus 

oe(n)  =^7 — 7 - reefn) 

A(q-‘,n-l) 


(2.13) 
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'it  is  of  interest.to  examine  Equation  (2.13).  A  relation  is  now  apparent  between  the  two 
adaptive  system  models.  Namely,  given  the  same  input  and  noise  sequence  to  both  the 
equatidn.e^or  and  output  error  models,  the  resuldng  error  sequences  are  related  through  a 
filtering  by  the  adaptive, filter's  denominator  polynomial,  A(q“^n-l).  This  relationship 
will  be  exploited  later  in  the  development  of  an  adaptive  algorithm. 

Choosing  the  term  A(q-l)y(n)  to  add  and  subtract  in  (2.12)  yields  a  relationship, 
analogous  to  the  equation  error  expression  (2.7),  between  oe(n)  and  the  parameter  error 
vector,  0(n-l)  =  9(n-l)  -  0p  [Jo84]: 

A(q-l)y(n)  -  A(q-l,n-l)y(n)  +  A(q-l)y(n)  -  A(q-l)y(n)  = 


[B(q-l)-B(q-l,n-l)]x(n)  +  A(q-l)v(n) 
Similarly  factoring  and  simplifying  yields: 

A(q-l)oe(n)  +  [A(q-l)-A(q-l,n-l)]y(n) 

=  [B(q-l)-B(q-l,n-l)]x(n)  +  A(q-l)v(n) 
=;^^{-[Mq'0-A(q-^n-l)]y(n) 


+  [B(q-l)-B(q-l,n-l)]x(n)}  +v(n) 

-M 

A(^ 


oe(n)  =  ~  .^_j.0^(n-l)(poe(n)  +  v(n) 


(2.14) 


Expressions  (2.14)  and  (2.7)  make  very  clear  the  effect  of  the  noise,  v(n),  in  each 

of  the  adaptive  system  models.  In  the  output  error  model,  assuming  v(n)  is  white,  the 

.  . 

noise  power  is  simply  a"  from  (2.14).  However,  examination  of  (2.7)  shows  the  noise 

O'? 

power  of  the  equation  error  model  to  be  (l+aj+a^+  •  •  •  +a^^)t^.  Thus  it  is  seen  that  the 
measurement  noise  affects  the  output  error  model  much  less  than  it  affects  the  equation 
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error,  mcxlel.  thedmplicatioris  of  these  additional  effects  of  the  noise  in  the  equation  error 
modes  are  discussed  in  section  2.3.1,  in  particular  with  respect  to  the  quality  of  the 
parameter  estimates,  0;  It  will  be  shown  that  the  presence  of  the  measurement  noise,  v(n), 
prodiices  a  bias  in  ihQ  equation  error  estimates,  6,  with  respect  to  the  plant  parameters,  0p. 

2.2  The  Mean  Square  Error  Surface 

As  discussed  in  Chapter  1,  both  the  criterion  of  performance  and  the  adjustment 
mechanism  depend  on  the  nature  of  the  error  signal,  y(n)-y(n).  Therefore  it  is  important  to 
examine  the  characteristics  of  this  signal  for  the  output  error  and  equation  error  models.  In 
particular,  the  mean  square  error  (MSE)  surface  will  be  discussed  for  both  models.  The 
MSE  surface  is  the  relationship  between  the  MSE  and  the  adaptive  filter  coefficients.  Later, 
in  Section  2.3.2,  these  characteristics  will  be  examined  ^  to  how  they  affect  the  ability  of 
an  adaptive  filter  to  reach  an  "optimal"  state. 

2.2.1  Equation  Error  Surface 

In  the  equation  error  derivations  which  follow  here  and  in  Section  2.3.1,  the 
parameter  estimate  vector,  9,  will  be  considered  to  be  a  constant  quantity  with  respect  to  the 
statistics  of  the  input  process,  x(n).  This  assumption  is  obviously  not  true,  since,  as  will 
be  seen  in  Chapter  3,  the  parameters  are  updated  by  algorithms  which  use  the  input,  among 
other  things,  to  accomplish  the  updating  process.  The  following  results  effectively  evaluate 
characteristics  of  an  adaptive  filter  whose  impulse  response,  h(i)=9i,  is  set  to  some 
arbitrary  constant  value.  Consideration  of  the  adaptive  filter  in  this  way  permits  the  use  of 
techniques  from  Wiener  filtering  theory  [Ha86,  Ch.3].  This  provides  a  means  to  evaluate 
the  performance  of  an  adaptive  filter  and  establishes  a  useful  basis  for  comparison  of  an 
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adaptive  filter  to  the  ideal,  time-invariant  situation,  this  is  especially  valid  under  the 
reasonable^  common  assumptibn  of  a  slowly  rime-varying  adaptive  filter. 

In  deriving  the  expression  for  the  MSE  surface,  it  is  helpful  to  use  matrix  notation. 
Recall  equation  (2.8),  reprinted  here  in  matrix  form  as  well: 

ee(n)  =  A(q-l)y(n)  -  B(q-9x(n)  =  y(n)  -  9^(p(n)  (2.15) 

Note  here  that  ee(n)  is  linear  in  the  parameters.  Thus  ee2(n)  is  quadratic  in  the  parameters, 
yielding  a  parabolic  MSE  surface,  and  is  shown  in  the  following.  Squaring,  expanding, 
and  taking  expectations  of  (2.15)  yields: 

E[ee2(n)]  =  E[y2(n)]  -  2E[y(n)0T(p(n)]  +  0tE[(p(n)(pT(n)]0 
Defining  R(n)  =  E[<p(n)(pT(n)],  o“(n)= E[y2(n)],  and  rearranging  yields: 

E[ce2(n)]  =  <jJ(n)  -  2§TE[y(n)(p(n)]  -»•  0TR(n)0  (2.16) 

For  stationary  processes.  The  expectation  operation  yields  values  independent  of  n.  Thus 
a^(n)=ay  and  R(n)=R  are  a  constant  value  and  matrix,  respectively,  and  the  equation  error 
model  possesses  an  error  surface  which  is  quadratic  in  the  parameters! .  This  is  a  very 
desirable  property  because  many  adaptive  schemes  require  the  MSE  surface  to  have  no 
local  minimum  points  for  guaranteed  convergence. 

Anotlier  useful  expression  for  the  MSEE  can  also  be  derived  using  (2.7),  reprinted 
here  for  convenience: 

ee(n)  =P{p^(n)  +  A(q-l)v(n) 

Squaring  and  taking  expectations  yields: 

^  The  notauon  for  the  MSE  surface  of  E[e2(n)]  is  mathematically  misleading  because  there  is  no 
indication  of  the  dependence  of  this  function  on  9.  However,  this  is  the  standard  notation  in  the  literature 
and  the  dependence  on  9  must  be  tacitly  assumed. 
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Eiee^(n)]  =  0^R0  +  20^e{  [A(q“^)v(n)](Pee(n))  +  (l+ajfa^+  •  •  • 


The  second  terai  can  be  simplified  by  first  expanding  the  operator  polynomial,  A(q“^): 


Noting  that  v(n)  is  white  and  y(n)=p(n)+v(n)  from  Figure  2.1,  it  can  be  seen  that  the 
suniinadon  will  expand  as: 


Defining  the  vector  a  as  the  (na+nb+l)-diraensional  vector  in  (2.17),  i.e.  as  the  plant 
parameter  vector  0p  with  the  b,  parameters  set  equal  to  zero,  gives  the  desired  expression 
for  this  term: 

E[A(q-l)v(n)(Pee(n)]  =-0^3 
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thus  an  alternative  expression  for  the  equation  error.  ee(n),  is: 
E[ee2(n)]  =  9TR9  -  2a^9’^a  +  (l+aj+a^+  •  •  • 


(2.18) 


2.2.2  Output  Error  Surface 


To  find  the  expression  for  the  MSE  of  the  output  error  model,  recall  equation 
(2.13)i  again  neglecting  the  time  dependence  of  the  coefficients: 


oe(n) = 


Expandliig  the  operator  notation  yields: 

oe(n)  ss  ee(n)  -  ]^^oe(n-i) 
i=l 


From  this  expression  it  is  evident  that  oe(n)  must  be  a  highly  nonlinear  function  of  the  aj 
parameters,  since  it  is  the  solution  to  a  difference  equation.  The  procedure  for  finding  the 
explicit  expression  for  E[oe2(n)]  in  terras  of  the  parameters  Sj  and  bj  is  given  in 
[Wi85,Ch.71. 


The  highly  nonlinear  nanune  of  the  MSE  surface  in  the  output  error  model  suggests 
possible  local  minima  in  this  surface.  Indeed,  it  is  this  characteristic  of  the  output  error 
model  that  causes  most  algorithms  to  fail.  The  problem  of  local  minima,  together  with  the 
inherent  stability  restrictions  of  using  an  HR  adaptive  filter,  have  severely  limited  ouqjut 
error  modelling  in  practical  situations.  These  issues  are  examined  more  closely  in  the 
following  section. 
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.3  Equation  Error  and- Output  Error:  A  Comparison 

In  any  practical  situation,  an  appropriate  model  for  the  given  problem  must  be 
chosen.  Therefore  it  is  important  to  compare  the  output  and  equation  error  models.  Three 
ihiportant  criteria  on  Avhich  to  base  this  comparison  are: 

1)  Performance  capability.  In  particular,  it  is  of  interest  to  examine  the 
minimum  achievable  mean  square  error  as  well  as  the  quality  of  the 
parameter  estimates.  In  other  words,  what  is  the  best  performance  that  can 
be  expected  from  the  chosen  model? 

2)  Characteristics  of  the  MSE  surface.  The  namre  of  the  MSE  surface  can 
drastically  affect  the  ability  of  an  adaptive  algorithm  to  minimize  this 
quantity. 

3)  Stability  considerations. 

2.3.1  Performance  Capability 

It  is  clear  that  in  the  problem  of  system  identification,  the  best  achievable 

performance  will  be  limited  by  the  amount  of  measurement  noise,  v(n).  This  is  easiest  to 

see  with  the  output  error  model  of  Figure  2.3,  It  can  be  seen  that  if  B(q-l)  =  B(q-^)  and 

A(q-l)  =  A(q-l),  then  oe(n)=y(n)-y(n)=v(n).  Therefore  the  output  error  model  will  give 

*  0 

the  best  achievable  performance  when  9=9p,  yielding  the  minimum  MSE  of  a”.  In  the 
literature,  i.e.  [So88,  eq.2.2],  [Lj83,  p.l09],  v(n)  itself  is  often  defined  as  the  output  error, 
since  this  is  the  error  which  results  from  the  output  error  model  with  the  adaptive  filter 
adjusted  so  that  9=9p.  This  can  also  be  seen  in  (2.14),  where  oe(n)=v(n)  if  9'=9-9p=0. 
Many  adaptive  schemes  attempt  to  minimize  the  MSE,  as  will  be  seen  in  Chapter  3. 
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Therefore,  using  ihe^  output  error  model.  MSE  -  minimizing  adaptive  algorithms  will 
provide  what  ju?  caded  unbiased  estimates  of  the  plant  parameters  after  convergence,  the 
term  unbiased  refers  to  parameter-estimates,  6,  whose  ensemble  average  is  equal  to  the  true 
plant  parameters,  0p.  In  other  words,  E[9]=0p  after  convergence.  This  is  a  desirable 
property  for  an  adaptive  system  to  have. 

If  v(n)  appealed  by  itself  as  an  additive  term  in  the  equation  error  expression  (2.7) 
as  it  does  in  the  output  error  expression  (2.13)  instead  of  being  filtered  by  A(q~l),  it  could 
similwiy  be  reasoned  that  the  equation  error  adaptive  system  simultaneously  provides 
unbiased  estimates  and  minimum  possible  MSE  of  Such  a  simation  would  occur  if  the 

plant  had  an  ARX  (na>0,  hb>0)  or  X  (6^=0,  nb>0),  since  for  these  cases: 

y(n)  =  0j(pee(n)  +  v(n) 

y(r0  -  6'^<Pce(n) 

and  therefore: 

ee(n)  =  y(n)  -  y(n)  =  0'r<pec(n)  +  v(n)  (2. 19) 

However,  for  the  current,  more  practical  case  of  the  ARMAX  plant  structure 
appearing  in  the  equation  error  adaptive  system  of  Figure  2.2,  the  issue  of  MMSE  and 
unbiased  estimates  is  not  as  intuitively  clear.  Observe  in  (2.18)  that  if 
0=0-0p=O,  then  E[ee2(n)]=(l+aj+a^+  •  •  •  +afl^)o^.  If  this  MSE  is  in  fact  the  minimum 
MSE  achievable  with  the  equation  error  model,  then  this  model  would  also  yield  unbiased 
estimates.  It  is  not  clear  from  a  cursory  examination  of  (2.18)  whether  this  is  so,  as  it  was 
in  the  output  error  case  upon  examination  of  (2.14).  However,  since  the  MSEE  is 
quadratic  in  the  parameters,  there  should  be  no  trouble  taking  derivatives  to  find  the 
parameter  which  yields  the  minimum  MSEE.  Minimization  will  be  on  (2.18),  repeated  here 
for  convenience: 
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This  is  an  imponant  result.  Examining  (2.20),  it  is  seen  that  one  of  two  conditions  must  be 
met  if  the  equation  error  adaptive  system  of  Figure  2.2  is  to  provide  unbiased  estimates: 

1)  v(n) H  0 

2)  a  =  0. 

Condition  2)  states  that  the  plant  has  no  autoregressive  component,  which  implies  an  X 
plant  smicttire.  Further  examination  of  (2.20)  shows  that  as  the  variance  of  v(n)  increases, 
the  aj  parameter  estimates  will  be  increasingly  biased.  This  is  the  major  problem  with  the 
equation  error  adaptive  system.  Note  that  the  bj  parameter  estimates,  however,  will  be 
unbiased. 
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It  can  funher  be  shown  that,  given  neither  of  the  above  two  conditions  are  met,  the 
minimum  .value  of  the  MSEE  is- always  greater  than  the  minimum  value  of  the  MSOE. 
which  is  as  follows,  Subsituting  6*  back  into  (2.20)  yields  the  minimum  value  of  the 
mean-squared  equation  error 

E[ce2(n)]n,in  =  [^o;R-laj’^R[^o;R-laj  +  (l+aj+a^+  •  •  • 

-2[^o;R-laj\o; 

4  0  0  0  0  lA 

=  o^a'^R-la  +  (l+aj+a;+  ♦  •  •  +a^^)cj“  -  2aya’rR-la 
=  ( l+aTa)o5  -  aJa^R-la 

=:o5  +  oJaT|l-o5R-ija  (2.21) 

It  can  therefore  be  seen  that  the  mean-square  value  of  the  equation  error  will  be  always 
greater  than  that  of  the  output  error  only  if  the  term  in  brackets  in  (2.21)  is  positive  definite. 
To  show  this,  it  is  sufficient  to  take  R  to  be  the  autocorrelation  matrix  of  the  plant  output, 
y(n),  and  a  to  be  the  n^xl  vector  of  the  aj  parameters  without  the  zeros  appended  as  in 
(2.17).  Since  v(n)  is  white  and  y(n)=p(n)+v(n),  it  can  be  seen: 

R  =  Rp  +  a“I 

Posimultiplying  the  term  in  brackets  in  (2.21)  by  R  will  not  change  its  definiteness. 
Performing  this  operation  yields  the  desired  result: 

[^I-<j;R-l]R  =  R-a“I  =  Rp>0 
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It  can  thus  be  seen  that  the  equation  error  model,  in  addition  to  providing  biased  estimates 
of  the  plant  parameter,  0p,  yields  a  higher  value  of  minimum  MSE  than  the  output  error 
model. 

Regarding  this  minimum  MSE  achievable  by  the  equation  error  model,  it  is  worthy 
to  note  here  a  tradeoff  that  exists  in  this  model.  In  general  the  minimum  value  of  the  MSEE 
can  be  lowered  by  increasing  nj  above  n^.  In  fact,  as  the  MMSEE-^o^  [Wi75, 
Sec.iV].  In  other  words,  the  performance  of  the  equation  error  model  can  be  arbitrarily 
improved  by  increasing  the  order  of  A(q-^n).  This,  however,  increases  the 
computational  burden  and  memory  requirements  of  the  adaptive  system. 

These  results  illustrate  the  superiority  of  the  output  error  model  over  the  equation 
error  model  in  the  system  identification  problem,  with  respect  to  the  ability  to  reach  an 
"optimal"  state,  which  is  now  seen  to  mean  that  it  can  provide  simultaneous  minimization 
of  the  MSE  and  unbiased  estimates.  This  superiority  is  also  intuitively  reasonable,  since 
unlike  the  equation  error  model,  the  output  error  adaptive  filter  is  an  DR  filter,  just  as  the 
assumed  plant,  and  it  would  seem  better  to  model  an  HR  plant  with  an  DR  filter. 

2.3.2  Characteristics  of  the  MSE  Surface 

As  noted  in  the  previous  section,  many  adaptive  algorithms  attempt  to  choose  a 

A 

parameter,  9,  in  an  (nj+nb+l)  -  dimensional  space  which  minimizes  the  mean  square  error. 
This  is  implemented  in  recursive  fashion  in  an  adaptive  algorithm  in  such  a  way  that,  given 
its  last  selection  for  the  parameter  vector^ ,  9(n-l),  pick  a  new  one,  6(n),  which  yields  a 
lower  MSE.  This  is  done  by  effectively  "looking  down"  the  MSE  surface  from  an  initial 
estimate,  6(0),  and  choosing  the  parameter  at  the  bottom  as  the  final  estimate.  Different 


■  Always  having  a  'last''  parameter  esumate  implies  that  the  algorithm  must  be  given  an  initial  estimate, 
6(0),  before  starting. 
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tUgQrithmS’dpjhis'in  different  ways,  but  the  mam  point  here  is  that  they  are  all  trying  to  get 
id  that  same  "bottom'  point.  This  process  works  fine  if  the  MSE  surface  is  reasonably 
smooth  and  has  no  local minimum  points,  since  'looking  down"  from  any  parameter  vector 
in  the  (113+05+ 1)  -  dimensional  space  will  always  lead  to  the  minimum  point. 

it  was  seen  in  Section  2,2.1  that  the  equation  error  model  has  a  quadratic  MSE 
surface.  This  yields  a  "bowl-shaped"  surface  in  the  parameter  space.  Therefore  the 
minimum  point  can  be  found  by  "looking  down"  from  any  initial  point,  6(0).  Thus  the 
parameter  yielding  the  minimum  MSEE  will  always  be  reached. 

The  ouq)ui  error  model,  however,  has  a  much  more  complicated  MSE  surface  and, 
as  noted  in  Section  2.2,2,  it  may  have  local  minima.  Therefore,  simply  "looking  down" 
the  MSE  surface  may  not  lead  to  its  global  minimum  point.  Since  there  is  no  way  for  the 
algorithm  to  know  if  it  is  at  a  local  or  global  minimum,  it  will  converge  to  either,  depending 
on  the  initial  estimate,  ^(0).  By  converging  to  a  local  minimum  point  of  the  MSE  surface, 
the  output  error  model  loses  its  most  desirable  feature  which  was  discussed  in  the  previous 
section:  Simultaneously  minimizing  the  MSOE  (recall  that  this  value  is  ap  and  providing 
unbiased  estimates  of  the  plant  parameters,  0p. 

The  lack  of  global  convergence  of  algorithms  trying  to  minimize  output  error  is  the 
major  reason  why  use  of  the  output  error  model  has  been  mainly  in  computer  simulations  in 
research  labs  and  not  in  practical  applications.  Recently,  however,  an  algorithm  has  been 
developed  which  can  provide  unbiased  estimates  of  the  plant  [Fa86].  This  algorithm  uses  a 
criterion  slightly  different  from  the  simplistic  "looking  down  the  MSE  surface"  approach  to 
adapt  the  HR  filter  of  Figure  2.3.  Since  unbiased  estimates  are  both  a  necessary  and 
sufficient  condition  for  minimum  MSOE  in  the  output  error  model,  this  algorithm  therefore 
retains  the  desirable  output  error  model  propeny  of  simultaneous  output  error  minimization 
and  unbiased  estimates,  but  does  not  get  stuck  in  local  minima.  This  algorithm  will  be 
studied  in  more  detail  in  Chapter  3. 
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2.3.3  Stability  Considerations 

Referring  to  the  equation  error  structure  of  Figure  2.2,  it  is  seen  that  this  method  of 
system  identification  requires  two  FER  filters  for  its  implementation.  Adaptive  algorithms 
perform  particularly  well  when  adapting  ETIR  filters,  with  respect  to  speed  of  convergence 
to  the  parameter  estimate,  0,  yielding  the  minimum  MSEE. 

The  output  error  model,  on  the  other  hand,  requires  an  UR  adaptive  filter,  as  shown 
in  Figure  2.3.  The  fact  that  UR  filters  have  poles  as  well  as  zeros  imposes  a  stability 
constraint  on  the  A(q-Kn)  polynomial.  This  requires  that  every  new  estimate  9(n), 
generated  by  the  adaptive  algorithm  be  checked  to  see  if  the  resulting  A(q“^n)  is  stable 
(i.e.  it  has  all  its  roots  inside  the  unit  circle).  Factoring  the  A(q-^n)  polynomial  to 
determine  its  roots  is  a  major  computational  burden  when  6^^,  however,  methods  have 
been  devised  [Ju64]  which  can  check  the  roots  of  a  discrete-time  polynomial  for  the 
presence  of  unstable  roots,  similar  m  the  Routh-Hurwitz  test  for  continuous  time 
polynomials.  This  eliminates  the  need  to  factor  A(q-^n).  If  it  is  determined  to  possess 
unstable  roots,  then  the  unstable  estimate, 

0unst(n) 9(n-l)  +  A0(n) 
should  be  replaced  with  the  stable  estimate 

0si(n)  =  0(n-l)  +  p(n)A0(n)  where  0<p(n)<l  (2.22) 

This  process  is  known  as  stability  projection  [LJ83,  Sec.6.6].  Note  that  the  choice  of 
p(,n)=0  corresponds  to  effectively  "throwing  out"  the  unstable  estimate.  0(n),  by  setting 
0(n)=0(n-l). 

An  example  of  the  process  of  stability  projection  for  the  case  of  na=2  is  shown  in 
Figure  2.4.  It  is  shown  in  [Ha86,  Sec.2.8]  that  stability  is  maintained  if  rjid  only  if  the 


,  A  A 

point  (ai,a2j  lies  in  the  triangular  region  shown.  Given  the  previous  estimate,  GCn-H,  the 
current  estimate  as  generated  by  the  adaptive  algonthm.  e.jnsi^nj,  is  shown  to  lie  outside  of 
the  stability  region.  To  generate  the  estimate  631(0),  Ae(n)  is  repeatedly  multiplied  by  a 
small  constant,  p,,  0<p.<l,  until  9(n)  lies  inside  the  triangular  stability  region.  This  yields  a 
value  for  p(n)  in  (2.22)  of  p(n)=p.P,  where  p  is  the  number  of  times  that  A0„nsj(n)  had  to 
be  multiplied  by  |i.  Typically,  the  choice  p.=0.5  works  well.  In  the  example  in  Figure  2.4, 
it  is  seen  that  after  multiplying  Ae(n)  by  p,  the  resulting  parameter,  61(0)  is  still  unstable. 
Multiplying  again  by  p  yields  the  final  stable  estimate,  631(0). 


2.4  Summary 

This  chapter  has  illustrated  in  detail  two  very  common  examples  of  the  two-step 
modelling  process  of  system  identification  as  described  in  Chapter  1 .  The  epuaiion-error 
model  and  the  output-error  model.  The  tirst  step  of  selecting  the  plant  structure  was  the 
same  for  both  models.  In  paracular,  an  ARMAX  plant  structure  with  C(q-0=A(q-0  was 
selected.  This  structure  corresponds  to  a  plant  modelled  as  having  a  rational  transfer 
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function  with  its  output  corrupted  by  additive  white  measurement  noise  (Figuie  2.1').  The 
second  step  -  selecting  the  adaptive  filter  structure  -  is  what  distinguishes  the  output-error 
model  from  the  equation-error  model 

The  equation-error  model  simply  uses  the  obser\'able  plant  signals  x(n)  and  ,  (n)  as 
inputs  to  seperate  FIR  filters,  generating  the  adaptive  filter  output,  y(n),  as  in  (2.2).  This 
yields  an  X  structure  for  the  adaptive  filter.  The  error  signal  for  this  model,  ee(n)=y(n)- 
y(n),  is  shown  to  be  linear  in  the  parameters,  0,  yielding  a  quadratic  MSE  surface,  this 
type  of  MSE  surface  is  very  desirable  because  it  is  mathematically  well-behaved  and  does 
not  contain  local  minima,  as  does  the  output-error  MSE  surface.  However,  the  price  paid 
for  the  filter  simplicity  and  quadratic  error  surface  is  biased  estimates,  shown  in  (2,20),  and 
minimum  MSE  which  is  greater  than  the  measurement  noise  variance,  shown  in  (2.21). 

The  output-error  model  employs  an  HR  adaptive  filter,  yielding  an  ARX  adaptive 
filter  strucmre.  This  model  has  the  ability  to  simultaneously  provide  unbiased  estimates 
and  optimal  minimum  MSE  of  o^.  However,  the  tradeoff  here  is  its  MSE  surface  is  highly 
nonlinear  and  can  have  local  minima.  Furthermore,  the  adaptive  HR  filter  must  always  be 
checked  for  stability  before  proceeding  after  a  parameter  update.  This  introduces  the 
additional  computational  burdens  of  stability  determination  and  projection. 
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CHAPTER  3 


ADAPTIVE  ALGORITHMS 


In  this  chapter,  recursive  algorithms  are  developed  in  Sections  3.1  and  3.2  which 
implement  the  process  of  "looking  down"  the  MSE  surface,  as  described  in  Section  2.3.2. 
In  Sections  3.3  and  3.4,  algorithms  are  presented  which  use  different  criterion  to  identify 
the  plant  parameters,  0p. 


3.1  Gradient'Based  Methods 


The  simplest  approach  to  recursively  find  a  minimum  point  of  a  surface  is  called  the 
steepest  descent  algorithm.  This  method  is  described  by  the  following  3-step  procedure: 

1)  Locate  the  direction  at  which  the  surface  is  most  rapidly  descending  from 
the  last  parameter  estimate,  0(n-l ). 

2)  Choose  the  current  estimate  0(n)  as  the  estimate  resulting  from  taking  a 
small  step  away  from  0(n-l)  in  the  direction  determined  in  step  1). 

3)  Go  back  to  step  1). 


Mathematically,  the  direction  of  step  1)  above  is  related  to  the  gradient  of  the  MSE 
surface.  Consider  a  function  f:IR”->R,  The  gradient  of  f  at  a  point  xe  IR”,  denoted 
7f(x),  is  a  generalization  of  the  derivative  of  a  function  of  one  variable,  and  is  defined  as: 


Vr(x)  = 


8f(x)  3f(x) 
3xi  9x2 


9f(x) 


iT 


9x 


n  J 
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Given  a  point  Xq  on  the  f(x)  versus  x  sunace.  the  direction  in  which  the  slope  is  maximum 
is  precisely  the  direction  of  the  gradient  vector.  Funhermore,  the  direction  of  minimum 
slope  is  opposite  to  the  direction  of  the  gradient  vector  [F177,  Sec.3.5]. 


The  gradient  of  the  MSE  surface,  denoted  by  VE[e2(n)]  (recall  that  VE[e2(n)]  is  a 
function  of  0),  is  thus  defined  as  the  vector  of  its  panial  derivatives  with  respect  to  the 
parameters  aj,  i=l,  ■  •  ’ ,  Ha*  •  •  • ,  05  as  follows: 


VE[e2(n)]  = 


aE[e2(n)] 

aaj 


0E[e2(n)]  aE[e2(n)]  aE[e2(n)] 


abo 


b  j 


Since  the  direction  of  minimum  slope  is  -VE[e2(n)],  the  steepest  descent  method  can  thus 
be  expressed  recursively  as: 


0(n)  =  0(n-l)  -  nVE[e2(n)]  (3.1) 

where  ii  is  a  small  stepsize,  which  determines  if  and  how  fast  the  algorithm  will  converge. 
It  is  shown  in  [Ha86,Sec.5.4],  for  the  equation  error  adaptive  system,  that  the  steepest 
descent  algorithm  will  converge  if  0<}i<2Aniax»  where  is  the  largest  eigenvalue  of  the 
correlation  matrix  R  =  ^(pee(n)(p^(n)j.  Also,  in  general  for  gradient-type  algorithms,  the 
rate  of  convergence  is  proportional  to  \l. 

Recall  from  Chapter  1,  however,  that  adaptive  filtering  applications  are  precisely 
those  in  which  the  environment  of  the  adaptive  filter  (i.e.  the  plant  in  the  case  of  system 
identification)  is  unknown  and/or  changing.  This  makes  taking  expectations  difficult  if  not 
impossible.  Therefore,  in  order  to  design  a  practical  algorithm,  the  expectation  operator 
must  be  dispensed  with,  jielding  approximate  or  instantaneous  gradient  methods.  There  is 
also  an  important  theoretical  justification  for  doing  this:  V[e2(n)]  is  by  definition  an 
unbiased  estimate  of  VE[e2(n)].  In  the  literature,  these  methods  are  often  referred  to  as 
swchastic  gradient  methods  [Ha86,Sec  5.3].  These  algorithms  thus  have  the  form: 
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0(n)  =  0(n-l)-!iV[e2(n)]  (3.2) 

In  order  to  implement  (3.2),  the  gradient  must  be  determined.  The  gradient  will  have  a 
different  form  depending  on  which  system  identification  model  is  used  (i.e.  output  error  or 
equation  error).  These  two  gradient  expressions  are  now  derived. 

3.1.1  The  Equation  Error  Stochastic  Gradient  Algorithm 

To  find  V[ee2(n)],  the  chain  rule(F177,  Sec.4.4]  is  first  applied: 

V[ee2(n)]  =  2ce(n)V[ec(n)l  (3.3) 

Recall: 

ee(n)  -y(n)  -  y(n)  =  y(n)  -  eT(n-l)(pce(n) 

Taking  derivatives  with  respect  to  the  parameters,  0(n-l),  noting  y(n)  is  independent  of 
0(n-l)  and  thus  Vy(n)=0,  the  following  expression  for  the  gradient  of  ee(n)  is  obtained: 

Vee(n)  =  -(pee(n)  (3.4) 

Substituting  (3.4)  into  (.3.3)  and  then  into  (.3.2)  yields  the  following  expression  for  the 
equation  error  stochastic  gradient  algorithm.  This  algorithm  is  known  as  the  LMS 
algorithm,  which  was  developed  by  Widrow  and  Hoff  [Wi751,  [Wi761,  [Ha86,Ch5], 
[Wi85.Ch6]: 

0(n)  =  0(n-l)  +  |i(pee(n)ee(n)  (3.5) 

Note  the  constant  ’2"  has  been  absorbed  into  the  stepsize,  Convergence  requirements 
for  this  algorithm  are  similar  to  those  of  the  steepest  descent  method.  In  particular 
[Ha86,Sec.5.12,prop.2],  for  mean-square  convergence  of  the  parameters.  0(n-l): 
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2  2 
Tr(R)  ~  Input  Power 


Where,  as  before,  the  Xj's  are  the  eigenvalues  of  the  correlation  matrix,  R. 

The  LMS  algorithm  is  the  most  widely  used  adaptive  algorithm  because  of  its 
computational  simplicity.  Funhermore,  since  it  uses  the  mean-squared  equation  error,  it 
possesses  the  desirable  characteristics  of  the  equation  error  model  discussed  in  Section  2.3 
(i.e.  unimodal  error  surface  and  no  adaptive  filter  stability  problems).  Thus,  even  though 
the  equation  error  modes  does  not  provide  optimal  MMSE  and  estimates  of  0p,  it  is 
dependable  and  does  in  fact  perform  satisfactorily  in  a  wide  variety  of  applications[Wi75]. 


3.1.2  The  Output  Error  Stochastic  Gradient  Algorithm 

Proceeding  similarly  as  in  the  previous  section,  the  gradient  of  oe2(n)  must  be 
determined.  As  before,  the  chain  rule  yields: 

Voe2(n)  =  2oe(n)  Voe(n) 

The  matrix  expression  for  the  ouqiut  error  is; 

oe(n)  =  y(n)  -  y(n)  =  y(n)  -  eTCn-OtpocCn) 

As  before,  Vy(n)=0,  so  the  expression  for  the  gradient  is: 

Voe(n)  =  -V0’r(n-l)(Poe(n) 

This  gradient  cannot  be  evaluated  as  simply  as  in  (3.4)  in  the  equation  error  case,  because 
the  y  terms  in  (p^e  depend  on  the  parameters,  0.  E.xpanding  the  matrix  notation  yields: 

-0'f'(n  -l)(Poe(n)  =  ai(n-l)y(n-l)  +  •  •  •  +  afi^(n-l)y(n-na) 

-  bo(n-l)x(n) - bfij^(n-l)x(n-nb) 
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In  the  following,  the  time  index,  in-1),  associated  with  the  adaptive  filter  parameters  aj  and 
bj  will  be  omitted  for  convenience.  The  appropnate  derivatives  for  the  gradient  vector  must 
now  be  taken,  using  the  chain  rule  since  y(n)  is  not  a  constant  with  respect  to  the 
parameters  [Jo84,Sec.ni.Al: 


-90T(pog(n)  ^  3y(n-l) 

’=ai— T-: —  + 


da; 


aaj 


3y(n-i)  .  „ 

+  a.__ —  +  y(n-i) - ^ 


‘  Da} 


5y(n-na) 
9ai 


ay(n-i)  , 

=  [A(q-0-l]^^iKn-i) 


90'^<Poe(n)  .  3y(n-l) 

- •  =  ai — TT —  + 


3bi 


3bj 


3y(n-nb) 

9bi 


-x(n-i) 


(3.6) 


9y(n-i)  , 

From  the  expressions  (3.6)  and  (3.7),  it  is  seen  that: 
Voe(n)  =-V0T(pQg(n) 


(3.7) 


=  -<Poe(n)  + 


[A(q->)-t] 


9y(n) 

8a| 


9y(n)  8y(n) 

9afi  abo 


3y(n) 


(3.8) 


Since  oetnj=ytnj-y^n),  7oet.nj=-Vy(nj,  which  is  the  negative  of  the  vector  in  second  temi 
on  the  right  of  (3.8).  Using  this  fact,  (3.8)  can  be  resvritten  as: 


no 


40 


It  is  important  to  note  a  significant  difference  between  the  output  error  algorithm  of 
(3.10)  and  the  equation  error  algorithm  of  (3.5).  The  algorithm  of  (3.5)  uses  the  equation 
error  regressor  vector,  (Pee(n),  "as  is,"  whereas  the  algorithm  of  (3.10)  requires  the  output 
error  regressor  vector,  (Poe(n),  to  be  filtered  by  the  autoregressive  polynomial  of  the 
adaptive  filter.  This  characteristic  of  regressor  filtering  is  very  common  among  algorithms 
which  have  been  developed  for  adaptive  HR  filters  [Jo84],  [Fa86],  [So88].  Note  that  the 
autoregressive  filtering  by  A(q-*  ,n- 1)  in  (3.10)  is  applied  to  a  vector.  This  implies  that  % 
past  values  of  (poe(‘^)>  which  is  a  total  of  na(na+nij+l)  values,  be  retained  in  memory  to 

■  This  result  can  be  amved  at  more  sunpl)  bv  considenng  the  output  error  expression  (2  1 1);  oe(n)  =  — — 

A(q-lj 

6(n-l)({>oei.n>+vt.nj.  Taking  denvauves  with  respect  to  0(n-l)  yields  7oetn>:  -■  (poet.n).  Since  A(q~l) 

A(q-0 

IS  unknown,  the  best  that  can  be  done  is  to  use  us  latest  estimate  A(q~l  ,n-l).  This  substitution  yields 
1,3.7").  This  IS  a  common  pracucal  way  of  dealing  with  filtered  quantities  and  will  be  used  later. 
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accomplish  this  operation.  Assuming  slowly  time  varying  adaptive  filter  parameters,  this 
memory  requirement  can  be  reducea  by  impiementmg  the  filtering  process  in  the  following 
approximate  manner  [Jo84,Sec.UL4],  [So88,Remark  1]  by  defining  the  quantities: 


yP(n)  = 


xF(n)  = 


I 

A(q-l,n-l) 

1 

A(q-l,n-l) 


y(n) 


x(n) 


The  following  approximation  can  now  be  used  in  (3.10): 

^  j  <Poc(») "  [y^(n-l)  •  •  ‘  y^(n-ha)  xP(n)  •  •  •  x^fn-nb)] 

This  approximate  filtered-regressor  e.xpression  requires  that  only  (fia+nb+l)  values  of  y(n) 
and  x(n)  be  stored  in  memory. 


As  mentioned  in  Section  2.3,  the  possible  multimodality  of  the  MSOE  surface  is  the 
major  drawback  of  the  output  error  model,  because  this  can  cause  (3.10)  to  stall  in  a  local 
minimum.  This  fact  can  be  seen  more  precisely  now  in  terms  of  the  gradient  vector.  At 
any  .local  minimum  point,  the  partial  derivatives  with  respect  to  every  variable  are  zero. 
Thus  VB[oe2(n)]  =  0  and  it  can  be  seen  from  t3.1)  that  6(n)=0(n-l).  In  other  words, 
gradient  algonihtns  effectively  "turn  off  at  local  minima.  Note  that  in  addition  to  "turning 
off  at  local  minima,  gradient  algorithms  will  "turn  off  at  local  maxima  and  inflection 
points  as  well.  However,  using  the  instantaneous  gradient  of  (3.2)  prevents  the  second 
term  of  the  algorithm  from  staying  at  0.  Therefore,  at  local  maxima  and  inflection  points, 
the  noisy  gradient  estimate  will  always  perturb  the  parameter  estimates.  0,  slightly  past 
these  points,  and  the  algorithm  will  continue  "loolang  down"  until  in  reaches  a  minimum 
point  (global  or  local).  The  points  of  minimum  MSE  are  referred  to  in  literature  dealing 
with  convergence  issues  as  stable  (Mi82,Sec.5.31  convergence  points  of  the  gradient 
aigonthm  (3.1).  When  the  practical  stochastic  gradient  algorithm  (3.10)  yields  parameter 
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estimates  near  these  points,  they  will  oscillate  about  them  because  of  the  noisy  gradient 
estimates  instead  of  moving  away  as  in  the  case  of  local  maxima  and  inflection  points. 

3.2  Least  Squares  Methods 

In  the  least  squares  (LS)  scheme,  it  is  desired  to  do  more  than  merely  move  the 
estimate,  6,  in  the  direction  of  the  minimum  of  the  mean-square  error  surface.  In 
particular,  the  least  squares  estimate  is  the  one  which  minimizes  at  every  time  instant,  n,  the 
following  criterion: 

i=l 

This  can  be  explained  more  intuitively  as  follows  [Ha86,  Ch.7].  Given  some  plant 
input/output  sequence  {x(i)}?,j,  {y(i))|Lj,  and  a  constant  parameter  estimate  e(n),  the 
output  sequence  of  the  adaptive  filter,  {y(n)}”_j,  will  produce  the  error  sequence  (y(i)- 
yO))|_i.  This  error  sequence  will  produce  a  corresponding  value  of  J(n).  The  LS 
estimate,  0Ls(n).  is  the  one  which,  when  held  constant  through  the  interval  i=l,  ■  •  •  ,n, 
as  above,  yields  the  minimum  value  of  J(n).  The  least  squares  method  is  seen  to  be  a 
deterministic  metliod,  in  that  no  statistical  assumptions  or  approximations  have  been  made, 
as  in  the  gradient  methods.^  Minimization  thus  takes  place  assuming  only  the  plant 
input/ourput  record  from  the  initial  time  up  to  the  current  time. 

Consider  the  minimization  with  respect  to  Ofn)  of  the  criterion  (3.11),  repeated 

here: 


n  n 

li  should  be  noted  here  that  mmunizauon  of  T'e^u)  is  equivalent  to  minimizing  which  by  the 

■t  n.“ 

i=l  1=1 

irtw  of  large  numbers  approaches  E[e‘t.n)]  as  n-^«>.  Therefore,  statistical  methods  based  on  the  .MSE 

sutiave  arc  asymptotically  equivalent  to  least  squares  methods.  In  other  words,  both  types  of  algorithms 
will  converge  to  the  same  point. 
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n 

JLs(n)  =  Z 


i=l 


The  chain  rule  yields  the  derivative: 

^=2ie(i)^ 

de(n)  -tt  d0(n) 


(3.12) 


Setting  this  derivative  to  zero  and  solving  for  9(n)  yields  the  LS  estimate  based  on  n 
observations,  0Ls(f^)<  "To  determine  its  value,  the  explicit  expression  for  the  error,  e(i), 
must  be  used.  Thus  in  a  similar  fashion  as  in  Section  3.1,  both  the  equation  error  and 
output  error  expressions  will  be  utilized  in  (3.12)  to  derive  adaptive  algorithms  for  the 
equation  error  and  output  error  adaptive  systems. 


3.2.1  The  Equation  Error  Least  Squares  Algorithm 

Using  the  matrix  expression  (2.3)  for  the  output  of  the  adaptive  filter  yields  the 
desired  expression  for  the  equation  error  at  the  iteration: 

ee(i)  =  y(i)  -  y(i)  =  y(i)  -  0T(n)(pee(i) 

.As  seen  in  Section  3.1.1,  the  derivative  of  this  quantity  is: 


dee(i) 

d0(n) 


=  V’^ee(i)  =  -<p^(i) 


Substituting  these  expressions  into  (3.12)  and  equating  to  zero  yields  the  least  squares 
estimate  of  0p  based  on  n  observations,  0Ls(n): 


a 

Z  [^y(i)  -  Qjs(n)(pee(i)jq>^(i)  =  0 


Solving  for  0ls(ii)  yields: 
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(1  tt 

e?'s(n)I'Pcea)'pL®  =  2y(i)<(>I.(i)  (3-13) 

i=l  i=l 

r  n  n 

9Ls(n)=  ZyW<PceU)  (3.14) 

i=l  i=l 

Even  though  (3.14)  is  a  valid  expression  for  the  LS  estimate,  note  that  it  is  not  recursive, 
that  is,  (3.14)  does  not  express  0ls(i^)  terms  of  only  the  data  at  the  previous  time 
instant.  In  other  words,  it  requires  ail  the  data  ftom  the  starting  time  to  the  current  time. 

It  is  possible  to  put  (3.14)  in  recursive  form  as  follows  [Lj83.Sec.2.2.1]  by 
defining  the  term  which  is  invened  in  (3.14)  as: 

R(n)  =  £9ee(i)9^(i)  =  +  9ec(n)‘P^(n)  (3.15) 

isi 

This  deftnition  used  in  (3.13)  and  transposing  yields: 

R(n)0Ls(n)  =  £y(i)iPee(i)  (3.16) 

i=l 

The  summation  above  can  be  expanded  to  give: 
n-l 

R(n)0Ls(n)  =  2y(i)(Pee(i)  +  y(n)9cc(n) 
i=l 

Applying  (3.16)  to  the  first  term  on  the  right  above  yields: 

R(n)0Ls(n)  =  R(n-l)0Ls(n-l)  +  y(n)(pee(n) 

Now  solving  (3.15)  for  R(n-l)  and  substimting  in  the  above  expression  yields: 

R(n)0Ls(n)  =  j^R('^HPee('^)<pJe("^j0LS(n-ll  +  y(n)(Pee(n) 
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=  R(n)0Ls(,n-l)  +  (Oee(,n)|  -<pJg(n)ei_s(n-nTy(n)  j 

Finally,  premultiplying  by  R“Hn)  and  recognizing  that  the  term  in  brackets  is  the  equation 
error,  e'*(n),  gives  the  desired  result: 

0Ls(n)  =  0Ls(n-l)  +  R-kn)9cc(n)ee(n)  (3. 17) 

The  equations  (3.17)  and  (3.15)  constitute  a  recursive  version  of  (3.14).  However,  notice 
that  the  matrix  R(n)  must  be  invened  at  each  iteration,  which  is  very  time  consuming.  This 
problem  can  be  allevnated  by  defining  the  matrix: 


_^n)  =  R“i(n) 

The  matrix  inversion  lemma  can  be  used  here  to  establish  the  following  recursion  to  update 
P(n)  [Lj83.p,191: 

P(n-l)q)„(n)<pj;.(n)P(n-l) 

P{ii)  =  P(n-l) - - -  (3.18) 

l+<p^(n)P(n-l)(Pec(n) 

Tliis  expression  allows  the  invened  matrix,  R"*(n),  to  be  updated  directly,  rather  than  first 
calculating  R(n)  using  (.3.15)  and  then  performing  the  matrix  inversion.  Note  that  (3.18) 
eliminates  the  need  for  matrix  inversion  altogether,  as  it  requires  only  a  single  scalar 
division.  The  expressions  (3.18)  and  (3.17)  with  R“*(n)=P(n)  constitute  what  is  known 
as  the  recursive  least  squares  (RLS)  algorithm. 


Weighted  Recursive  Least  Squares  (WRLS) 

In  applications,  it  is  often  desirable  to  assign  weights  to  the  individual  observations 
of  the  least  squares  parameter  esnmanon  problem.  Weighting  an  observation  can  indicate 
iome  measure  of  its  imponance,  accuracy,  or  relevence  in  determining  the  new  parameter 
esnmate,  0(n).  The  pamcular  choice  of  weighting  assignments  depends  on  the  application. 
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and  can  have  either  a  heuristic  or  analytic  basis.  The  criterion  function  (3.1 1),  modified  to 
reflect  weighted  observations,  is  now; 
n 

Jy/LSin)  =  Z^U)ee2(i)  (3.19) 

hi 

where  {a(i))|'_j  is  the  sequersce  of  observation  weights. 

Carrying  through  the  procedure  of  minimization  of  (3.19),  and  generating  a 
recursive  formulation  as  done  previously  yields  the  following  modified  least  squares 
algorithm,  commonly  known  as  weighted  least  squares  (WLS): 

9(n)  =  9(n-l.)  +  a(n)P(n)(Pee(n)ee(n) 

a(n)P(n-l)<(>ee(n)9j^(n)P(n-l) 

P(n)  =  P(n-l)  - - = - - ^ 

l+a(n)q>^(n)P(n-l)(pce(n) 

Recursive  Least  Squares  with  Forgetting  Factor  (RLSFF) 

A  paTw..  lar  weighting  scheme  which  assips  progressively  lower  weights  to  past 
observations  is  useful  in  dealing  with  time  varying  systems.  This  gives  the  least  squares 
algorithm  the  characteristic  of  "forgetting"  data  from  the  distant  past  which  may  not  be 
relavent  in  determining  the  current  "optimal"  value  of  the  parameter  estimates.  Note  that  in 
this  scheme,  a  given  observation  will  be  systematically  be  assigned  smaller  and  smaller 
weights  as  the  time  index,  n,  increases.  This  is  in  contrast  to  WRLS,  which  assips  a 
constant  weight  to  each  observation.  The  criterion  function  for  this  weighting  scheme  is 
thus  (Lj83.Sec.2.6.21: 

n 

■^RLSFF^*^)  =  (3.20) 

i=l 
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where  the  weighting  sequence.  { p(i,n))f_j,  is  increasing  in  the  variable  i  and  decreasing  in 
the  variable  n.  A  typical  choice  for  this  weighting  sequence  is: 

p(i,n)  =  /c"-* 

where  X,  called  thz  forgetting  factor,  is  constant,  and  0<X<1.  This  choice  of  p(i,n)  is  seen 
to  yield  an  exponential  forgetting  characteristic  when  considered  as  a  function  of  the  time 
variable,  n.  A  forgetting  factor,  X,  yields  an  algorithm  with  an  effective  "memory"  of  the 
past  1/(1-X)  observations  [StiSQ!. 

Allowing  the  RLS  algorithm  to  "forget"  past  observations  in  this  manner  produces 
an  increase  in  the  "bouncing  around"  of  the  estimates,  6(n),  about  their  target  values  in  the 
steady  state,  resulting  in  a  higher  value  of  MMSE.  This  is  because  useful  past  data  is 
effectively  "thrown  out,"  leaving  the  algorithm  more  susceptible  to  the  noise  contribution  in 
fewer  observations.  This  characteristic  illustrates  a  general  tradeoff  which  exists  in  most 
adaptive  algorithms  between  tracking  ability  and  MMSE. 

As  before,  carrying  through  the  minimization  of  (3.20)  and  derivation  of  a  recursive 
algorithm  yields  the  following  algorithm,  known  as  recursive  least  squares  with  forgetting 
factor  (RLSFF): 

0(n)  =  0(n-l)  +  P(n)(pee(n)ee(n) 

r  P(n-l)(pee(n)(pJ  (n)P(n-l) 

P(n)  =  P(n-l)-i  - — - 

^  X+a(n)tp^(n)P(n-l)(pee(n) 

Weighted  Recursive  Least  Squares  with  Forgetting  Factor  (WRLSFF) 

The  three  previously  presented  least  squares  algorithms  can  be  lumped  into  one 
olgonthm  by  employing  both  weighting  schemes  simultaneously.  This  algorithm  is  called 
weighted  recursive  least  squares  with  forgetting  factor  (WRLSFF): 
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9fn)  =  9fn-l)  +  a(n)P(n)(Pee(n)ee(n) 


P(n-l) 


a(n)P(n-l)(pee(n)q)Jg(n)P(n-l) 

X+a(n)(p^(n)P(n-l)(pee(n) 


(3.21a) 


(3.21b) 


This  algorithm  obviously  includes  RLS,  WRLS,  and  RLSFF  as  special  cases  with 
appropriate  choices  for  a(n)  and  A,. 

As  a  computational  issue,  note  that  all  the  least  squares  algorithms  given  previously 
require  that  P(n-l)9(j£(n)  be  evaluated  to  determine  P(n)  and  then  subsequently  evaluating 
P(n)(pee(n)  to  update  9(n).  This  latter  matrix  multiplication  can  be  avoided  by  manipulating 
the  P(n)  update  equation  (3.21b)  by  postmultiplying  both  sides  by  (Pcc(n)  and  expressing 
the  term  in  brackets  over  a  common  denominator 

P(n)<Pcc(n)  [^P(n--l)9ce(n) 

■HX(n)P(n-l){Pec(n)(p^(n)P(n-l)(Pee(n) 

-ct(n)P(n-l)(Pee(n){pJg(n)P(n-l)(Pee(n)y^A,+a(n)(p^(n)P(n-l)(pee(n)j 

_ P(n-l)(pee(n) 

^+«(n)(p^(n)P(n-l)(pee(n) 


Using  this  relationship,  the  least  squares  recursion  of  (3.21)  can  be  implemented  in  the 
following  four-step  procedure: 

1)  Calculate  Mfn)  -  P(n--l){pee(n)  (3.22a) 

2)  Calculate  L(n)  = - — -  (3.22b) 

A,+a(n)(p^(n)M(n) 
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3)  9(n)  =  Ofn-U  T- L('n)ee(n)  (3.22c) 

4)  P(n)=r[P(n-l)-L(n)MT(n)]  (3.22d) 

A. 

3.2.2  The  Output  Error  Least  Squares  Algorithm  (RPE) 

Unfortunately,  analytic  minimization  of  the  criterion  (3.1 1)  when  e(n)=oe(n)  turns 
out  to  be  impossible  for  a  recursive  algorithm.  This  is  because  of  the  highly  nonlinear 
relationship  between  the  parameter  estimates.  0,  and  the  corresponding  adaptive  filter 
output,  y(n)=6TtPee(t^).  The  difficulty  lies  in  the  problem  of  solving  (3.12)  set  equal  to 
zero  for  the  least  squares  estimate,  0ls(>^)*  Recall  for  the  equation  error  case,  e(i)  is  linear 
in  0  and  thus  de(i)/d0  is  constant.  This  gives  rise  to  the  estimate  0ls(*^).  which  is  the 
solution  to  a  linear  (matrix)  equation.  The  nonlinear  equation  resulting  when  using  the 
output  error  model  cannot  be  solved  so  simply.  Instead,  numerical  methods  must  be 
utilized  to  minimize  (3.11).  This  procedure  usually  requires  several  evaluations  of  (3.11) 
which  uses  all  of  the  data  since  the  initial  time,  n=l.  Therefore,  a  recursive  algorithm 
based  on  this  type  of  minimization  is  not  possible. 

It  is  thus  necessary  to  introduce  approximations  in  the  quest  for  a  recursive 
algorithm  in  order  to  attempt  to  minimize  (3.1 1).  .A  very  general  method  is  presented  in 
[Lj83,Sec.3.7.2],  which  accomodates  a  plant  having  the  structure  of  an  extended  form  of 
the  Box- Jenkins  model,  which  was  briefly  mentioned  in  Chapter  1.  .Algorithms  such  as 
this  which  can  approximately  minimize  the  least  squares  critenon  for  plant  structures  that 
are  more  complicated  than  ARX  are  known  as  recursive  prediction  error  (RPE)  methods. 

The  problem  at  hand  is  to  minimize  (3.11)  using  the  output  error.  oe(n),  as  the  error 
term,  emi-  The  output  error  should  be  thought  of  here  more  generally  as  the  error  between 
an  adaptive  filter  and  a  plant  having  the  ARMAX  structure  of  Figure  2.1.  .\otice  tiu.t  no 
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mention  has  been  made  here  of  the  strucmre  of  the  adaptive  filter.  This  is  because  the 
general  RPE  method  specifies  the  structure  of  the  oprimai  adaptive  filter  based  on  the  plant 
structure.  It  turns  out  that  the  ARX  adaptive  filter  of  the  output  error  adaptive  system, 
shown  in  Figure  2.3,  is  the  optimal  adaptive  filter  structure  for  use  with  given  ARMAX 
plant  strucmre.  This  should  not  be  surprising,  given  the  investigation  and  discussion  of 
the  properties  of  the  output  error  model  given  in  Chapter  2  (i.e.  unbiased  estimates  and 
minimum  possible  MMSE  of  o^). 

The  RPE  algorithm  is  now  shown  here  in  the  context  of  the  output  error  adaptive 
system  of  Figure  2.3: 


9(n)  =  6(n-l)  +  P(n)\}/(n)oc(n) 

"  X+\}rT(n)P(n-l)\i/(n) .  . 


(3.23a) 

(3.23b) 


where 


H/(n)  =  -Voe(n)  = 


1 

A(q-l,n-l) 


<Poc(n) 


The  forgetting  factor,  X,  and  observation  weighting  coefficient.  a(n),  have  also  been 
included  here  and  serve  the  same  purpose  as  in  WRLSFF.  Note  the  striking  similarity 
between  the  RPE  algorithm  and  WRLSFF.  In  fact,  the  RPE  algorithm  reduces  to 
WRLSFF  when  the  equation  error  quantities,  ee(n)  and  (Pee(n),  are  used  in  place  of  the 
corresponding  output  error  quantities,  oe(n)  and  (Poe(n).  Also  note  that  (3.23a)  reduces  to 
the  gradient  methods  of  (3.5)  and  (3.10)  for  the  equation  error  and  output  error  cases, 
respectively,  when  P(n)H}iL  These  are  some  .amples  of  the  generality  and  wide 
applicablilty  of  the  RPE  algorithm.  It  should  also  be  noted  that  (3.22)  can  obviously  be 
used  to  implement  the  recursion  (3.23)  with  the  appropriate  changes  of  the  notation. 
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3.3  The  Method  of  Optimal  Bounding  Ellipsoids  (OBE) 

Recently,  a  system  identification  technique  providing  an  alternative  to  least  squares 
equation  error  minimization  has  been  proposed  [Fo82],  [Da87].  In  these  papers, 
algorithms  were  developed  which  perform  optimization  based  on  geometric  considerations 
rather  than  the  analytic  minimization  of  (3.11).  What  sets  these  algorithms  apan  from  least 
squares  methods  is  the  manner  in  which  the  contribution  to  the  plant  output  of  the  noise  is 
characterized.  Instead  of  imposing  some  statistical  assumptions  on  the  noise  (i.e.  white 
noise),  as  has  been  the  case  thus  far,  these  algorithms  were  developed  assuming  that  the 
noise  contribution  in  the  plant  description  has  a  magnitude  bound,  7,  at  every  time  instant, 
n.  This  key  assumption  gives  rise  to  an  algorithm  having  the  ability  to  decide  very  quickly 
whether  the  current  data  observation,  (pce(o).  yields  any  additional  information  which  could 
improve  upon  the  previous  estimate,  9(n-l).  If  it  is  determined  that  (Pce(n)  contains  no 
new  information,  updating  (a  computationally  expensive  operation)  need  not  occur.  In 
other  words,  this  algorithm  possesses  the  very  attractive  characteristic  of  selective 
updating.  This  feature  has  been  seen  in  simulations  to  reduce  the  amount  of  computation 
considerably,  as  the  algorithm  tends  to  use  less  than  twenty  percent  of  the  input  data 
[Hu861. 

The  geometric  criterion  used  in  the  OBE  algorithms  is  described  using  the  concept 
of  a  membership  set.  A  membership  set  is  the  set  of  points  in  the  parameter  space  which 
are  consistant  with  the  data  obser%'ations,  assumed  plant  structure,  and  the  noise  bound. 
Initially,  the  membership  set  is  the  set  of  all  points  in  the  (nat-ni,+l)-dimensional  parameter 
space.  Practically,  however,  it  is  chosen  to  be  a  very  large  (fia+hb+U-dimensional 
ellipsoid  which  must  include  all  possible  valid  parameter  values  of  the  plant.  Starting  with 
this  initial  ellipsoid,  the  following  algorithm  is  then  implemented: 
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1)  A  check  is  made  of  each  subsequent  data  observation  to  determine  if  the 
"size"  of  the  previous  ellipsoid  can  be  reduced  by  utilizing  the  current  data 
(regressor  vector),  <Pee(n).'’‘ 

2a)  If  so,  a  new,  smaller  ellipsoid  is  then  determined  by  the  algorithm. 

2b)  If  not,  the  current  data  is  discarded,  the  previous  ellipsoid  is  kept  as  the 
current  ellipsoid,  the  next  data  observation  is  brought  in,  and  the  process  is 
repeated  by  returning  to  step  1). 

At  each  iteration,  k.  the  corresponding  parameter  estimate,  0(k),  is  taken  to  be  the  center  of 
the  ellipsoid  generated  so  far  from  the  current  and  previous  iterations,  i=l,-  •  •  ,k. 

The  algorithm  derived  in  [Da873  using  a2(n)  as  an  optimization  parameter,  has  a 
very  familiar  form.  In  fact,  it  is  identical  to  WRLSFF  of  (3.21)  with  a  data-dependent 
scheme  of  generating  the  weighting  coefficients,  a(n).  A  time-varying  forgetting  factor, 
X(n),  is  also  employed,  and  is  related  to  the  weighting  coefficient  by: 


X(n)  =  1  -  a(n) 

Define  the  following  quantities,  as  in  [Da871: 
G(n)  =  (p2g(n)P(n-l)(pee(n) 


P(n)  = 


T 


-a^(n-l) 
ee^t'  n) 


(0,1),  a  design  parameter 
The  OBE  algorithm  is  now  presented: 


■  Vanous  measures  of  the  "size '  of  the  ellipsoid  have  been  employed.  In  fFo82],  the  size  was  defined  in 
iwo  ways,  each  yielviing  a  slightly  different  algonthm.  The  two  measures  of  size  were  1)  the  volume  of  the 
ellipsoid,  and  2)  the  sum  of  the  semi-a:ces  of  the  ellipsoid.  In  [Da87],  a  more  abstract  minimization 
vntenon,  a‘(n)  (note  this  has  nothing  to  do  with  noise  variance),  was  used.  This  was  seen  to  yield  a 
simpler  algonthm  which  is  essenually  an  implementation  of  WRLSFF,  as  will  be  seen. 
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while  (not  done)  do  begin 

get  the  current  data  vector,  (pee(n) 
if  7^  >  a2(n-l)  +  ee2(n)  then  begin  { no  update  needed} 
a2(n)  =  a2(n-l) 

0(n)  =  e(n-l) 


end 


else  begin  {determine  the  weighting  coefficients,  a(n),  the  new  value  of 
a2(n),  and  update  with  WRLSFF) 


I 


v(n)  =  ^  l-G(n) 


■V 


G(n) 


l+P(n)[G(n)-n 


|,if  P(n)[G(n)-l]+l>0 
if  p(n)[G(n)-ll+l<0 


a(n)  =  min(i;,v(n)) 


a2(n)  =  [l-a(n)]a2(n-l)  +  a(n)'i^  - 


a(n)[l-ct(n)]ee^(n) 
l-a(n)  +  a(n)G(n) 


implement  (3.21)  with  X(n)  =  I  -  a(n) 


end  (if) 
end  (while  } 

In  summary,  the  QBE  algoritlim  possesses  two  key  properdes  which  could  be  very 
desirable  in  applications.  They  are: 

1)  The  bounded  noise  assumption.  Most  algorithms  employ  statistical 
assumptions  to  characterize  the  noise  contribution  in  the  plant  output  (i.e. 
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white  noise).  In  practice,  however,  satisfaction  of  these  assumptions  are 
both  difficult  to  guarantee  and  verify.  In  practice,  it  is  usually  much  easier 
to  obtain  a  magnitude  bound  on  the  output  noise. 

2)  The  selective  updating  strategy.  This  frees  the  algorithm  for  about  70-80% 
of  the  total  running  time,  which  opens  up  (largely  unexplored)  possibilities 
for  time-sharing  the  algorithm  with  multiple  processes. 

3.4  The  SteiglitZ'McBride  Method  (SMM) 

As  mentioned  at  the  beginning  of  this  chapter,  both  the  gradient  and  least  squares 
algorithms  implement  the  process  of  "looking  down"  the  MSE  surface.  In  Section  2.3.2,  it 
was  described  how  local  minima  in  the  MSE  surface  of  the  output  error  adaptive  system 
could  cause  this  type  of  algorithm  to  fail.  It  is  therefore  of  interest  to  examine  alternative 
methods  of  system  identification  which  do  not  require  unimodality  if  the  output  error 
surface  for  convergence  of  the  adaptive  filter  parameter  estimates,  0,  to  the  plant 
parameters,  9p. 

An  adaptive  algorithm  was  developed  recently  by  Fan  [Fa86]  which  minimizes  a 
criterion  first  considered  by  Steiglitz  and  McBride.  For  sufficient  order  adaptive  systems 
(i.e.  na>na  and  njj^nb),  the  SMM  criterion  has  a  unimodal  character  containing  a  global 
minimum  which  coincides  with  the  global  minimum  of  the  MSOE  surface.  Simulation 
studies  have  also  shown  this  to  be  true  in  some  reduced  order  cases  (i.e.  hjj<na  or  iibiSnb). 
Reduced  order  adaptive  systems  are  of  extreme  interest,  since  in  most  practical  situations, 
the  plant  order,  n^,  is  unknown.^  Furthermore,  in  many  cases,  the  plant  may  not  even  be 
of  the  form  B(q-i)  /  A(q~^)  as  has  been  assumed  throughout  this  thesis.  In  this  case,  an 

'  In  fact,  reduced  order  systems  can  cause  the  existence  of  local  minima  m  the  MSOE  ?unace.  Cases  have 
been  documented  [StSll  of  adapnve  systems,  possessing  unimodal  MSOE  surfaces  with  a  sufficient  order 
adapnve  filter,  havuig  mulamodal  MSOE  surfaces  when  the  sufficient  order  adaptive  filter  is  replaced  with 
one  of  reduced  order. 
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adaptive  filter  of  sufficient  order  would  need  an  infinite -dimensional  parameter  vector,  6 
(i.e.  and/or  n5-?c«j.  Thus  acceptable  (hopefully  optimal  in  some  sense)  reduced 

order  performance  is  an  important  feature  for  any  practical  adaptive  system  to  have. 

The  SMM  scheme  is  presented  here  as  follows.  Consider  the  ARMAX  plant  of 
Figure  2.1,  described  by  the  relationship: 

A(q-l)y(n)  =  B(q-l)x(n)  +  A(q-l)v(n) 

If  the  quantities  x(n),  y(,n),  and  v(n)  arc  autoregrcssivcly  filtered  by  the  A(q“^) 
polynomial,  the  above  relationship  can  be  expressed  as: 


A(q-‘)y'(n)  =  B(q-l)x'(n)  -i-  v(n) 


(3.24) 


where 


y'(n)  - 


A(q-1) 


y(n) 


x'(n)  = 


A(q-l) 


x(n) 


In  what  follows,  it  will  be  seen  that  minimization  of  the  eqation  error  of  the 
"primed"  adaptive  system  having  the  plant  described  by  (3.24)  will  be  the  goal  of  the  SMM 
algorithms.  This  is  the  essence  of  the  SMM  approach.  Minimization  will  be  accomplished 
using  both  the  gradient  and  least  squares  techniques. 


3.4.1  Gradient  Minimization 

Observe  that  (3.24)  describes  an  ARX  plant  with  input  xfn)  and  ouqjut  y  (n).  As 
mentioned  in  Section  2.3.1,  the  equanon  error  adaptive  system  will  simultaneously  provide 
minimum  MSE  of  and  unbiased  estimates  for  ARX  plant  stnictures.  Therefore,  one 
might  expect  the  algorithms  presented  thus  far  for  equation  error  systems  to  perform 
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optimally  for  the  ’’primed"  equation  error  adaptive  system  having  a  plant  described  by 

(3.24) ,  with  input  x’(n)  and  output  y'(n).  In  particular,  applying  the  LMS  algorithm  of 
(3.5)  yields: 

0(n)  =  e(n-l)  +  |i9sMM(n)ee’(n)  (3.25) 

where 

T 

•••  -y'(n-na)  x'(n)  •••  x'(n-nb)] 

and  ee'(n)  is  the  equation  error  of  the  adaptive  system  having  the  ARX  plant  described  by 

(3.24) . 

The  relationship  between  ee(n)  and  ee'(n)  will  be  needed  later.  It  can  be  derived 
simply  as  follows:  From  (2.19)  applied  to  the  ARX  system  of  (3.24),  it  is  seen  that 

ee'(n)  =  ^^^^(n)  +  v(n)  (3.26) 

The  expression  for  the  equation  error  was  given  in  (2.7)  as: 

ee(n)  =  0'’^(Pee(n)  +  A(q-l)v(n)  (3.27) 

Autoregressively  filtering  each  term  of  (3.27)  by  A(q~^)  yields: 

ly  ee(n)  =  0T(p^(n)  +  v(n)  (3.28) 

Thus  it  is  seen  from  (3.26)  and  (3.28)  that: 

ee'(n)  =  -  ^  ee(n)  (3.29) 

A(q-l) 

This  relationship  should  have  been  expected,  since  the  "prime"  has  denoted  here  an 
autoregressive  filtering  by  A(q"^). 
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Note  that  generating  the  filtered  regressor.  (Pgg(.n),  requires  knowledge  of  the  A(q-l) 
polynomial,  which  is  not  known.  .\s  mentioned  in  the  footnote  of  section  3.1.2  regarding 
the  generation  of  an  expression  for  Voe(n),  the  most  recent  estimate  of  A(q“l),  which  is 
A(q-l,n-l),  can  be  used  to  (approximar  ^,)  implement  the  filtering  operation.  Thus  the 
following  approximation  is  made,  yielding  the  "EF"  algorithms  of  [Fa86]: 

'P>)=^-PeeW  =  X(^W")  (3.30) 

This  filtering  operation  can  be  further  simplified,  as  was  also  shown  in  Section  3.1.2,  by 
defining  the  filtered  quantities: 


xF(n)  = 


1 


A(q-l,n-l) 

1 


y(n) 

x(n) 


A(q-l,n-l) 

I 

Therefore  a  simpler  approximation  to  tp^^Cn)  is : 

•“  xP(n-nb)] 


(3.31) 


This  approximation  yields  the  simpler  non-"IF"  algorithms  in  [Fa861. 


In  addition  to  making  the  regressor  filtering  realizable,  a  very  interesting 
relationship  results  from  using  A(q“*,n-l)  for  the  filtering  operation.  The  relationship 
(3.29)  is  now  modified  to: 


ee'(n)  = 


A(q-l)  A(q-l.n-l) 


ee(n)  =  oe(n) 


(3.32) 


In  light  of  the  approximate  equivalence  between  ee'(n)  and  oe(n)  in  (3.32)  as  the  AR 
adaptive  filter  parameters  converge  to  those  of  the  plant,  it  has  been  shown  [Fa86]  that  the 
LMS-type  algorithm  of  (3.25),  through  minimizanon  of  the  ’primed"  adaptive  system 
equation  error,  ee’(n),  svill  approximately  minimize  the  output  error  of  the  original 
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ARMAX  adaptive  system  of  Figure  2.3,  thus  providing  unbiased  estimates.  Also,  in  order 
to  generate  a  realizable  algorithm.  (3.32)  can  be  applied  to  (3.25),  yielding  the  output  error 
algorithm  given  in  [Fa86]: 

9(n)  =  e(n-l)  +  ^9^(n)oe(n)  (3.33) 

As  previously  mentioned,  (3.30)  or  (3.31)  can  be  used  to  approximately  determine  (Pgg(n), 
yielding  the  "EF"  and  non-"IF''  algorithms,  respectively,  in  [Fa86]. 

3.4.2  Least  Squares  Minimization 

It  is  also  natural  to  consider  a  least  squares  minimization  of  the  equation  error  of  the 
"primed"  adaptive  system  in  addition  to  gradient  minimization.  This  is  accomplished 
straightforwardly  by  recalling  the  least-squares  criterion  introduced  in  Section  3.2: 

JLs(n)  =  S 

i=l 

For  the  current  problem,  e(i)see'(i),  yielding  the  following  criterion  for  the  "primed" 
equation  error  system: 

n 

JLS(n)  =  2ee'2(i)  (3.34) 

i=l 

Recall  that  the  sequence  of  error  values,  {ee'(i) }  "_j,  is  obtained  by  holding  the  parameters 
of  the  adaptive  filter  constant  in  the  interval  i=l,  •  •  •  ,n,  these  parameters  being  denoted  as 
9(n).  Applying  the  equation  error  relation  (2.8)  to  the  primed  system  thus  expands  (3.34) 
to: 

n 

-  B(q-l,n)x‘(i)]" 
i=l 
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Finally,  again  recall  the  process  of  implementing  filtered  quantities  -  the  last  available 
estimates  are  used  to  filter  the  desired  signal.  Thus,  substimting  for  the  signals  x  (i)  and 
y'(i)  their  actual  realizations  yields  the  desired  least  squares  criterion  for  the  "primed" 
adaptive  system: 


=I 

i=l 


(3.35) 


This  is  the  general  form  of  the  SMM  criterion  [Fa891,  which  has  been  seen  to  exhibit  a 
unimodal  characteristic  even  when  the  original  output  error  least  squares  criterion.  Zoe2(n), 
(or  equivalently  the  MSOE  surface)  is  multimodaL 

It  is  interesting  to  consider  the  crittrion  (3.35)  as  the  estimates  converge  to  the  true 
plant  parameters.  In  this  case,  most  of  the  polynomials 


A(q-l,i)  andB(q-M) 

for  i=l,  •  •  • ,  n,  will  be  approximately  equal.  Therefore,  upon  convergence  to  the  plant 
parameters,  the  criterion  will  approach  the  following  expression: 

B(q-0  f 


The  term  in  brackets  is  now  seen  to  be  the  output  error,  oe(n).  Thus  it  is  seen  that 

I 

if  the  plant  parameters  do  in  fact  minimize  minimization  of  this  criterion  is  then 
equivalent  to  output  error  minimization.  This  has  been  proved  for  the  case  of  a  sufficient 
order  adaptive  system  having  white  output  noise,  v(n)  [Stc81]. 

Returning  to  the  original  SMM  cnterion  of  (3.35),  it  can  be  seen  that  minimization 
of  JLs(n)  is  exactly  the  same  as  least  squares  minimization  on  the  "primed"  adaptive 
system,  i.e.,  considering  the  signals  x'(n)  and  y'(n)  as  the  input  and  output  signals  to  an 
ARX  plant.  Also  in  light  of  the  equivalence  between  1^5(0)  and  JLs(n)  as  6(n)-^8p,  it  has 
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been  shown  that  the  output  error,  oe(n),  can  be  utilized  in  place  of  ee'(n),  similar  to  the 
gradient  case.  Therefore,  the  SMM  algorithm  is  the  following  modified  version  of  RLS: 

0(n)  =  e(n-l)  +  R-l(p^(n)oe(n) 

R(n)  =  R(n-l)  +  <p^(n)(pJ(n) 

The  above  algorithm  will  be  referred  to  as  SMM(RLS),  since  it  uses  the  recursive  least 
squares  algorithm  in  the  context  of  the  "primed"  adaptive  system  of  the  SMM  approach. 
Note  that  the  data  weighting  and  forgetting  factor  techniques  discussed  in  Section  3.2.1  can 
also  be  utilized  to  implement  the  SMM  approach.  In  the  following  chapter,  the  behavior  of 
one  of  these  standard  SMM  algorithms,  SMM(RLSFF),  will  be  observed  through 
computer  simulations  and  compared  with  a  new  SMM-t^qje  algorithm. 
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CHAPTER  4 


SOME  NEW  OUTPUT  ERROR  ADAPTIVE  ALGORITHMS 


4.1  Use  of  OBE  in  the  Output  Error  Adaptive  Systenn 

Until  now.  the  OBE  algorithms  have  not  been  implemented  in  an  output  error 
adaptive" system  for  identification  of  the  ARMAX  plant  of  Figure  2.1.  One  of  the  OBE 
algorithms  has  been  extended  [Rao89,  Ch.3],  however,  in  a  manner  similar  to  the  extended 
least  squares  (ELS)  scheme  [Lj83,  Sec  2.5.1],  which  permits  identification  of  a  general 
ARMAX  plant  (EOBE).  Though  this  is  an  important  result  in  its  own  right,  it  places 
restriciions  on  the  Cj  coefficients  of  the  plant  to  ensure  proper  convergence  of  the  adaptive 
filter  parameters.  This  limits  the  plants  to  which  this  technique  can  be  applied  since,  in 
general,  there  is  no  control  over  the  plant  parameters.  For  the  current  ARMAX  system 
identification  problem,  it  was  shown  in  Chapter  2  that  Cj=a|,  for  i=0,  •  •  -  .na  (a{)=l).  It  will 
be  shown  here  that  this  ARMAX  case  can  alternatively  be  dealt  with  by  considering  the 
output  error  adaptive  system. 

To  understand  the  reason  why  OBE  cannot  be  directly  applied  to  the  output  error 
adaptive  system,  consider  the  expression  for  the  output,  y(n),  of  the  plant,  given  as: 

^  ^ 
y(n)  =  -Zaiy(n-i)  +  X^ixln-i)  +  2aiV(n-i) 
i=l  i=0  i=0 

Recall  the  key  assumpnon  of  the  OBE  algorithm;  The  contribution  of  the  noise  to  the  plant 
output  must  be  bounded.  In  practice,  the  quantity  which  is  most  realistically  bounded  is  the 
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output  noise  term,  v(n).  However,  knowledge  of  a  raagnimde  bound  on  v(n)  is  not 
equivalent  to  knowing  a  bound  on  the  total  noise  contribution  to  y(n).  This  is  because  the 
noise  contribution  is  acmally  an  FIR  filtered  version  of  v(n),  with  the  filter  being  the 
unknown  autoregressive  plant  polynomial,  A(q“^).  Since  the  noise  bound  depends  on  the 
unknown  plant  parameters,  applying  OBE  in  this  situation  is  not  a  well-posed  problem. 
This  is  one  of  the  reasons  why  for  a  proper  operation  of  OBE  on  this  system,  some 
restrictions  must  be  placed  on  the  aj  coefficients  to  ensure  that  the  total  noise  term  satisfies  a 
bound  when  v(n)  itself  is  bounded  [Rao89,  Sec.3.3]. 

4,1.1  Presentation  of  the  Algorithm 

As  an  alternative  to  the  general  ARMAX  identification  scheme  of  EOBE,  again 
consider  the  plant  description,  given  in  operator  notation: 

A(q-l)y(n)  =  B(q’-l)x(n)  +  A(q-l)v(n) 

Autorcgressively  filtering  each  quantity  by  A(q~l),  exactly  as  done  in  Section  3.4,  yields 
an  SMM-type  approach  to  identification  of  the  plant: 

=  B(q-l)x'(n)  +  v(n)  (4.1) 

It  is  important  to  see  here  that  now  v(n)  appears  "as  is"  in  the  alternative  plant  description 
(4.1),  and  thus  the  bounded  noise  assumption  is  directly  satisfied  without  requiring  any 
restrictions  on  the  plant.  Furthermore,  note  that  (4.1)  describes  an  ARX  system  with  input 
x'(n)  and  output  y’(n),  which  is  the  strucnire  needed  to  utilize  OBE.  Also,  recall  (Section 
3.4,  Eq.(3.32))  the  approximate  equivalence  of  the  error  in  equation  error  adaptive  system 
using  t,4.1)  as  the  plant  description  and  the  error  of  the  output  error  adaptive  system  having 
the  original  ARMAX  plant. 
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Therefore,  using  OBE  to  identify  the  original  ARMAX  plant  is  equivalent  to 
applying  the  algorithm  to  the  "primed"  ARX  system  of  (4.1).  This  will  require  two 
modifications  to  the  original  algorithm: 

1)  The  signals  x'(n)  and  y’(n)  must  be  used  as  the  input  and  output  quantities 

t 

in  the  regressor  vector.  In  other  words,  <Pee(n)  must  be  used  in  place  of 

I 

(Pec(a)  in  the  standard  OBE  algorithm.  Two  methods  of  generating  (p^(n) 
were  given  in  Section  3.4,  Eqs.  (3.30)  and  (3.31). 

2)  The  output  error,  oe(n),  of  the  adaptive  system  having  the  original  ARMAX 
plant  must  be  used  in  place  of  ee(n)  in  the  original  OBE  algorithm.  This  is 
possible  in  light  of  the  approximate  equivalence  between  the  equation  error 
of  the  "primed"  system,  ee'(n),  and  the  output  error,  oe(n). 

This  algorithm  will  be  referred  to  as  SMM(OBE). 

As  a  test  of  this  algorithm,  simulations  were  performed  for  three  cases  considered 
in  [Fa86],  where  the  authors'  algorithm  (3.33)  was  shown  to  converge  to  unbiased 
estimates  of  the  plant: 

Case  1)  Sufficient  order  adaptive  filter,  unimodal  performance  surface. 
For  this  case  the  output  error  adaptive  system  is  described  by: 

B(q-0  _  1 

A(q“l)  l-l.2q-40.6q-2 

B(q-^n)  ^  bo(n) 

A(q-^n)  l-4i(n)q-l-a2(n)q-2 

uniformly  distnbuted,  zero  mean,  unit  vanance,  white  sequence  was  used  as  the  input, 
■xtn).  It  was  shown  in  [StrSl]  that  the  error  surface  of  this  adaptive  system  is  unimodal. 
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Case  2)  Sufficient  order  adaptive  filter,  multimodal  performance 
surface. 

A  multimodal  error  surface  was  constructed  in  [So75],  using  the  following  adaptive 

system: 

B(q-0  _  1  _  1 

A(q-1)  (i-0.7q-l)^  l-1.4q-40.49q-2 

B(q-t,n)  ^  bQ(n) 

A(q-l,n)  l-ai(n)q-l-a2(n)q-2 

The  input  sequence,  x(n),  was  a  correlated  sequence,  obtained  by  passing  uniformly 
distributed,  zero  mean,  unit  variance,  white  noise  through  the  following  filter: 

(l-O.7q-l)^(l+0.7q-l)^  =*  1  -0.98q-2  +  0.2401q-^ 


Case  3)  Reduced  order  adaptive  filter,  multimodal  performance 
surface. 

The  multimodal  reduced  order  adaptive  system  examined  here  was  introduced  in 
[Jo77].  It  is  described  by: 

B(q-0  _  0.05-0.4q-l 

A(q-l)  l-1.1314q-l+0.25q-2 

B(q-l,n)^  bo(n) 

A(q-l,n)  l-ai(n)q-l 

As  in  case  1),  the  input ,  xtn),  was  a  uniformly  distributed,  zero  mean,  unit  variance,  white 
sequence. 
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In  all  the  simulation  cases,  the  output  noise,  v(n),  was  a  uniformly  distributed,  zero 
mean  white  sequence  independent  of  the  white  sequence  generating  the  input,  x(n).  It 
should  also  be  noted  here  that  stability  projection  (see  Section  2.3.3)  was  used,  because  of 
the  IIR  structure  of  the  adaptive  filter.  The  simulations  were  run  with  the  following 
questions  in  mind:  1)  Is  this  algorithm  a  viable  alternative  to  output  error  adaptive  system 
identification?  More  simply  put,  does  this  algorithm  work  at  all?  2)  If  the  answer  to  1)  is 
affirmative,  then  how  does  SMM(OBE)  compare  to  the  "standard"  SMM  algorithms 
[So88],  which  use  equation-error  minimization  algorithms  such  as  LMS  [Fa86],  or  RLS. 
This  latter  algorithm  will  be  denoted  as  SMM(RLS). 

A  panial  answer  to  question  (1)  was  obtained  by  running  the  simulations  and 
checking  for  global  convergence.  To  illustrate  proper  operation  of  SMM(OBE), 
simulations  were  run  and  the  behavior  of  the  parameter  estimates  was  observed.  For  the 
sufficient  order  cases  1)  and  2),  recall  from  Section  2.3.1  that  MMSOE=ay  and  the 
parameters  which  yield  this  MMSOE  are  precisely  those  of  the  plant,  0p.  On  the  other 
hand,  in  the  reduced  order  adaptive  system  of  case  3),  there  is  no  "tme"  parameter  vector 
that  the  adaptive  filter  can  take  on  that  will  match  the  plant  exactly,  since  0  and  0p  have 
different  dimensions.  The  resulting  MSOE  surface  for  this  adaptive  system  will  thus  have 
a  minimum  point  which  is  greater  than  o“,  due  to  the  inability  of  the  adaptive  filter  to 
"match"  the  plant.  The  disparity  between  the  minimum  MSOE  achievable  with  a  sufficient 
order  adaptive  filter  and  that  achieved  by  a  reduced  order  adaptive  filter  is  caused  by  what  is 
known  as  model  mismatch.  In  other  words,  the  minimum  MSOE  of  a  reduced  order 
adaptive  system  can  be  thought  of  as  being  separated  into  two  components  as  follows: 

MMSOE  =  MMSOEv  +  MMSOE^m 

where  MMSOEy  ib  the  minimum  mean  square  output  error  due  to  the  output  noise,  v(n), 
obtained  by  considering  the  adaptive  filter  to  be  of  sufficient  order.  .Again  recall  from 
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Section  2.3.1  that  MMSOEv=a“.  The  term  is  the  minimum  mean  square 

output  error  due  to  the  model  mismatch,  which  is  obtained  by  taking  vtn)H0  and  generating 
the  resulting  MSOE  surface.  This  was  done  in  [Jo77]  for  the  simulation  case  3),  where  it 
was  shown  that  MMSOEn^n^=0.2066,  occurring  for  the  adaptive  filter  parameters  of: 

e  =  [ai  60]”^  =  [-0.906 -0.31 1]T 

Recall  for  simulation  cases  2)  and  3),  the  MSOE  surfaces  are  multimodal  [Fa86]. 
Therefore,  to  illustrate  global  convergence  in  these  situations,  initial  estimates,  0(0),  were 
provided  which  were  very  close  to  a  local  minimum  of  the  MSOE  surface.  The  parameter 
trajectories  of  the  adaptive  filter  were  then  observed  to  see  if  the  parameters  were  adapted 
such  that  they  moved  away  from  the  parameter  yielding  the  local  minimum  MSOE  to  the 
one  yielding  the  global  minimum  MSOE.  The  trajectories  obtained  for  cases  l)-3)  are 
shown  in  Figure  4.1  for  a  simulation  run  of  1000  iterations,  which  was  well  after 
convergence.t  Shown  is  the  initial  parameter  estimate,  0(0),  the  final  estimate,  0(1000), 
and  the  theoretical  parameter  estimate,  Oq,  yielding  the  MMSOE.  In  the  simulation  cases  2) 
and  3),  the  parameter  estimates  are  seen  to  be  adapted  away  from  the  parameter  yielding  a 
local  minimum  MSOE  towards  the  one  yielding  the  global  minimum  MSOE. 


Convergence  was  determined  by  viewing  the  leammg  curve,  to  be  discussed  in  Section  4  1  2 
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Figure  4.1a  Sim?,iiation  case  1)  trajectories 


Figure  4.  lb  Simulation  case  2)  trajectories 
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e(0)=[0.5  0.21T 
9(1000)=[-0.84  -0.35]T 

eo=[-o.9i  -0.31F 


-a.<(  *e.3  -d.3  -d.i  0.8  0.1  ^  o.a 

t>0 

Figure  4.1c  Simulation  case  3)  trajectories 

A  remark  is  in  order  regarding  the  trajectories  shown  in  Figure  4.1.  Note  that  the 
trajectories  given  for  cases  1)  and  2)  arc  a  plot  of  a2  versus  aj.  The  reason  for  considering 
only  the  AR  parameters  as  opposed  to  the  X  parameter  Bq  is  that  in  the  sufficient  order 
case,  the  MSOE  surface  of  an  output  error  adaptive  system  is  quadratic  in  the  X 
parameters,  and  the  X  parameter  estimate  which  minimizes  the  MSOE  is  in  fact  the  true 
parameter,  bg.  Thus  there  is  no  problem  obtaining  unbiased  estimates  for  these  parameters 
because  there  are  no  local  minima  with  respect  to  them,  as  might  happen  with  the  AR 
parameter  estimates.  However,  this  is  not  true  in  reduced  order  adaptive  systems. 
Therefore,  in  the  reduced  order  case  3),  the  plot  of  aj  versus  Bq  is  considered.  This  is 
because  the  MSOE  surface  is  a  highly  nonlinear  function  of  both  the  AR  and  X 
coefficients.^  and  thus  m  addition  to  the  behavior  of  the  aj  parameter,  the  behavior  of  the 


■  See  [Jo77]  or  [Sh891  For  Uie  explicit  expression  for  the  MSOE^  ..urface.  i.e..  ihe  expression  of  MSOE 
in  terms  of  Bq  and  dj  with  v(n)=0. 


139 


69 


A 

estimate  bg  needs  to  be  observed  for  proper  convergence  to  the  parameter  yielding 
MMSOE. 

Thus  from  the  results  shown  in  Figure  4.1,  it  appears  that  the  answer  to  question 
(1)  above  is  "yes,"  i.e.,  SMM(OBE)  did  in  fact  identify  the  parameters  of  an  ARMAX  plant 
in  the  output  error  adaptive  system  configuration  for  the  simulation  cases  l)-3).  Next, 
attention  is  given  to  the  more  interesting  (and  more  difficult)  question  (2),  which  is  the 
subject  of  the  next  section. 

4.1.2  Performance  of  SMM(OBE)  versus  SMM(RLSFF) 

“Depending  on  the  application,  the  environment  in  which  an  adaptive  system  is 
operating  could  have  either  a  large  or  small  noise  content.  It  is  therefore  of  general  interest 
to  examine  the  performance  of  adaptive  algorithms  operating  on  systems  with  varying 
levels  of  noise.  Funhermore,  the  examination  of  performance  with  respect  to  different 
noise  levels  can  also  serve  as  a  basis  of  comparison  between  different  algorithms.  Of 
particular  interest  here  is  an  investigation  of  the  performance  of  a  "standard"  SMM 
algorithm,  SMM(RLSFF),  with  respect  to  the  new  SMM  algoritiim,  SMM(OBE). 

To  compare  the  two  algorithms  in  this  manner,  simulations  were  performed  on  the 
simulation  cases  l)-3)  for  varying  signai-to-noise  ratios  (SNR’s).  The  SNR  is  defined  as: 

”7 

SNR  =  — 

Usually  (as  will  be  the  case  in  this  discussion),  this  quantity  is  given  in  decibels  (dB), 
convening  the  above  expression  to: 
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SNR(dB)  =  iOloa— 

The  SNR  was  varied  from  -10  to  10  dB,  in  steps  of  2  dB,  which  was  accomplished  by 
appropriately  scaling  the  uniform  output  noise,  v(n),  with  respect  to  the  input  signal.  The 
algorithm  SMM(RLSFF)  was  implemented  with  a  forgetting  factor.  A,,  of  0.99.  At  each 
value  of  the  SNR,  SMM(RLSFF)  and  SMM(OBE)  were  compared  with  respect  to  two 
criterion: 

1 )  The  minimum  value  of  MSOE  which  was  achieved  (MMSOE). 

2)  Transient  MSOE  behavior. 

Both  criteria  were  evaluated  through  the  consideration  of  the  adaptive  system  learning 
curvcy  which  is  a  plot  of  the  MSOE  versus  n,  the  number  of  iterations.  Initially,  at  the  start 
of  adaptation,  the  MSOE  is  usually  high,  since  the  initial  adaptive  filter  parameters,  9(0), 
arc  probably  much  different  than  the  true  plant  parameters,  0p.  As  adaptation  proceeds,  the 
estimates,  0(n),  generally  adapt  so  as  to  get  closer  to  0p.  This  process  yields  a 
monotonically  decreasing  sequence  of  MSOE  values  as  a  function  of  the  time  index,  n.  For 
an  algorithm  which  does  in  fact  converge,  this  sequence  will  approach  some  constant 
minimum  value  as  n  gets  large  (i.e.  the  MMSOE)t 

Note  that  the  characteristics  described  above  for  the  learning  curve  apply  to  the 
curve  E[oe2(n)]  versus  n.  To  determine  this  curve  experimentally  would  require  taking  an 
ensemble  average  of  an  infinite  number  of  independent  realizations  of  oe^(n)  versus  n. 
Obviously  this  is  not  possible.  However,  averaging  a  relatively  small  number  of  the  curves 
oe^(n)  versus  n  provides  a  very  good  indication  of  MSOE  performance,  even  though  these 


■  For  this  to  be  true,  the  common  assumpuon  made  here  tand  throughout  this  discussion)  is  that  the  plant 
is  fixed  and  that  the  input  and  noise  sequences,  x(.n)  and  v(n),  are  stationary. 
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curves  may  not  have  the  precise  monotone  charactenstics  of  the  acmal  learning  curve.  In 
fact,  there  will  be  considerable  fluctuation  in  these  curves,  as  will  be  seen. 

With  these  properties  of  the  learning  curve  in  mind,  the  two  performance  criteria  are 
now  considered  for  SMM(RLSFF)  and  SMM(OBE). 

MMSOE 

After  viewing  the  learning  curve  resulting  firam  each  simulation  case  at  every  SNR 
using  both  algorithms,  it  appeared  that  all  of  the  curves  reached  their  minimum,  steady-state 
levels  well  before  1000  iterations.  Sec  Figure  4.2  for  a  typical  example  of  this  transient 
behavior  for  both  SMM(RLSFF)  and  SMM(OBE).  The  quantity  used  as  an  approximation 
to  the  MMSOE  was  a  time  average  of  the  last  50  values  of  the  experimental  learning  curve. 
This  value  will  be  called  the  steady-state  MSE  (SSMSE).  To  compare  the  two  algorithms 
with  respect  to  this  quantity,  the  SSMSE  was  plotted  as  a  function  of  the  SNR.  These 
plots  arc  shown  for  each  of  the  simulation  cases  in  Figures  4.3a,  4.4a,  and  4.5a.  Observe 
that  tlie  curves  have  an  e.xponential  characteristic.  To  see  why  this  is  so,  recall  from 
Section  2.3.1  that  the  minimum  value  of  MSOE  is  occurring  when  O=0p.  Therefore, 
the  experimental  curves  should  approximate  a  curve  of  versus  SNR.  But  recall  that  the 
SNR  IS  related  to  cr^  as  follows: 

•5 

SNR(dB)  =  lOlog— 

Solving  for  yields: 
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where  the  function  aiog  is  the  inverse  of  the  (base  10)  log  function.  It  is  thus  seen  that  the 
theoretical  MMSOE  versus  SNR  curve  has  the  above  exponential  form.  Tlte  theoretical 
curves  are  given  in  Figure  4.3b,  4.4b,  and  4.5b  for  each  simulation  case. 
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SNR(dB) 


Figure  4.4a  Simulation  case  2)  SSMSE  curve 


Simulation  Case  2) 


SNR(dB) 


Figure  4.4b  Theoretical  MMSOE  for  case  2) 


SNR(dB) 


Figure  4.5a  Simulation  case  3)  SSMSE  curve 


Simulation  Case  3) 


SNR(dB) 


Figure  4.5b  Theoretical  MMSOE  for  case  3) 
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Upon  viewing  the  experimental  curves  of  Figures  4.3a-4.5a,  it  can  be  seen  that 
SMM(OBE)  performs  very  comparably  to  SMM(RLSFF).  Also  note  that  the  experimental 
curves  are  very  close  to  their  theoretical  lower  bounds  of  MMSOE  performance.  This  is 
encouraging,  since  RLS  algorithms  are  based  on  MSE  minimization,  while  the  OBE 
algorithm  is  based  on  minimizing  ellipsoidal  membership  sets  in  a  geometrical  sense. 
Though  these  two  schemes  minimize  different  quantities,  it  can  be  seen  in  Figures  4.3-4.5 
that  SMM(OBE)  does  in  fact  perform  very  well  with  respect  to  the  MSE  minimization 
criterion  of  RLS,  especially  in  the  region  of  low  SNR  values.  At  higher  SNR  levels, 
however,  SMM(OBE)  occasionally,  but  not  consistently,  yieldexl  values  of  SSMSE  which 
exceed  significantly  those  of  SMM(RLSFF).  This  was  especially  evident  in  simulation 
case  1)  (Figure  4.3),  but  can  also  be  seen  to  some  degree  in  each  of  the  curves.  Thus  it 
might  be  conjectured  that  anomalous  SSMSE  behavior  is  more  prone  to  occur  in 
SMM(OBE)  at  higher  SNR's  than  at  lower  values.  In  particular,  for  negative  SNR  levels, 
all  the  simulation  curves  in  Figures  4.3-4.5  indicate  consistently  good  SMM(OBE) 
performance.  The  apparent  anomalous  behavior  was  observed  always  when  the  SNR  was 
greater  than  zero. 

The  above  observations  suggest  near  optimum  performance  of  SMM(OBE)  with 
respect  to  the  SSMSE  criterion,  especially  at  low  SNR’s.  Next,  the  transient  characteristics 
of  the  learning  curve  will  be  addressed  and  it  will  be  seen  that  SMM(OBE)  actually  exhibits 
superior  transient  behavior  to  that  of  SMM(RLSFF)  at  low  SNR's. 

Transient  MSOE  behavior 

In  many  applications  (especially  time-varying  cases)  the  best  steady-state 
performance  may  not  be  the  only  imponant  concern.  The  maimer  in  which  the  steady  state 
is  achieved  may  also  be  of  extreme  importance.  Examination  of  the  learning  curve  also 
provides  much  insight  into  this  transient  behavior. 
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Upon  viewing  the  learning  curves  of  SMMfRLSFF)  and  SMMCOBE")  for  each 
SNR,  some  very  interesting  characteristics  were  observed.  In  most  cases  the  learning 
curves  obtained  when  using  SMM(RLSFF)  exhibited  much  higher  peaks  at  smaller  values 
of  n  compared  to  that  of  SMM(OBE).  Though  this  usually  occurred,  it  was  especially 
evident  at  low  SNR's,  where  SMM(RLSFF)  peaked  at  huge  values  of  MSOE  compared  to 
the  peaking  of  SMM(OBE).  See  again  Figure  4.2  for  an  example  of  this  behavior. 

However,  as  with  the  SSMSE  comparisons,  some  seemingly  anomalous  behavior 
was  observed  in  the  learning  curve  of  SMM(OBE)  at  some  higher  SNR's.  In  fact,  the 
same  SNR's  which  yielded  higher  SSMSE  yielded  very  peculiar  learning  curves.  As  can 
be  seen  in  Figure  4.3a,  this  unusual  behavior  was  exhibited  in  simulation  case  1)  at  the 
SNR's  of  6  and  10  dB.  The  corresponding  learning  curves  arc  shown  in  Figures  4.6  and 
4.7.  Referring  to  Figure  4.6,  it  is  seen  that  both  algorithms  peak  at  about  the  same  time 
and  magnimde.  The  SMM(OBE)  learning  curve  of  Figure  4.6b  shows  that  up  to 
approximately  150  iterations,  SMM(OBE)  appears  to  be  converging  smoothly  as  it  did  in 
most  of  the  other  simulations  (i.e.  see  Figure  4.2b).  However,  after  this  point,  the  learning 
curve  exhibits  erratic  behavior  and  subsequently  does  not  settle  to  a  level  comparable  to 
SMM(RLSFF).  In  Figure  4.7,  both  learning  curves  appear  on  the  same  plot  for 
comparison.  Referring  to  Figure  4.7a,  it  is  seen  that  the  peak  of  the  learning  curve  of 
SMM(OBE)  is  significantly  higher  than  that  of  SMM(RLSFF),  though  they  both  reach 
steady  state  at  about  the  same  time.  To  observe  the  steady  state  characteristics  of  the 
curves,  the  portion  of  Figure  4,7a  from  800  to  1000  iterations  was  expanded  in  Figure 
4.7b.  Erratic  steady  state  behavior  of  the  learning  curve  of  SMM(OBE)  is  again  observed. 
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SMM(RLSFF)  - SMM(OBE) 


Figure  4.7a  Simulation  ease  1)  learning  curves.  SNR=10<1B 


- SMM(RLSFF)  - SMM(OBE) 


Figure  4.7b  Simuiation  case  1)  learning  curves.  SNR=10dB 
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An  additional  phenomenon  of  anomalous  behavior  was  found  in  SMM(RLSFF), 
which  was  surprising.  In  simulation  case  2)  at  the  low  SNR's  of  -2  and  -10  dB, 
SMM(RLSFF)  took  very  long  to  converge  compared  to  both  SMMIOBE")  at  those  SNR's 
as  well  as  itself  at  all  other  SNR's.  See  Figures  4.8  and  4.9  for  these  learning  curves. 
Since  there  is  so  much  peaking  at  low  values  of  n,  the  curves  of  Figures  4.8a  and  4.9a 
were  viewed  starting  from  n=500  in  Figures  4.8b  and  4.9b  in  order  to  see  the  actual  point 
at  which  steady  state  was  achieved.  It  can  be  seen  that  in  both  cases,  more  than  800 
iterations  were  needed  for  SMM(RLSFF)  to  achieve  steady  state.  Figures  4.8c  and  4.9c 
show  the  learning  curves  yielded  by  SMM(OBE)  for  the  same  simulation  cases,  and  it  can 
be  secnjhat  SMM(OBE)  converged  smoothly  in  less  than  200  iterations,  as  it  did  for  all 
cases  of  negative  SNR. 
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Figure  4.8a  Simulation  case  2)  learning  curve  for  SMM(RLSFF).  SNR=-2dB 
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Figure  4.9c  Simulation  case  2)  learning  curve  for  SMM(OBE).  SNR=-10dB 

4.1.3  Summary  of  Simulations 

To  summarize  the  observations  of  the  preceding  section,  a  few  key  points  will  be 
reiterated  here.  First  of  all,  both  SMM(RLSFF)  and  SMM(OBE)  performed  well  in  the 
cases  studied.  However,  unusual  behavior  of  both  algorithms  was  found  which  seemed  to 
follow  a  trend  with  respect  to  varying  SNR  levels.  In  particular,  at  a  few  low  SNR's, 
which  were  negative,  SMM(RLSFF)  was  found  to  converge  very  slowly.  SMM(OBE),  on 
the  other  hand,  converged  very  rapidly  for  all  simulation  cases  at  every  SNR  less  than  zero. 
This  might  suggest  a  greater  dependability  of  SMM(OBE)  with  respect  to  SMM(RLSFF)  in 
the  presence  of  higher  noise  levels.  At  larger  SNR's  (>0),  SMM(RLSFF)  appears  to  be 
the  more  dependable  algorithm,  as  SMbllOBEi  was  seen  not  to  converge  very  well  in  a 
few  cases. 
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Also  worthy  of  mention  are  two  observations  made  on  the  performance  of 
SMM(OBE)  as  the  SNR  varied.  The  first  observation  was  a  consistent  increase  in  the 
amount  of  data  used  by  SMM(OBE)  (i.e.  the  number  of  updates  to  9)  as  the  noise  level 
increased  (i.e.  SNRi).  This  is  an  interesting  and  inmitively  satisfying  property'  for  an 
information-dependent  updating  scheme  to  have.  It  seems  reasonable  that  as  the  signal  gets 
more  and  more  corrupted  by  noise,  the  algorithm  needs  to  take  more  and  more  "looks"  at  it 
to  extract  the  proper  information.  Out  of  1000  iterations,  the  average  number  of  updgitc£ 
used  in  10  independent  runs  of  SMM(OBE)  and  the  corresponding  SNR  values  for  each  of 
the  simulation  cases  are  shown  in  Table  1,  where  the  inverse  relationship  between  the  SNR 
and  the  number  of  updates  is  evident  for  all  but  a  few  increments  in  the  SNR.  Note  in  only 
one  simulation  did  the  amount  of  data  used  for  updating  0  exceed  10%  of  the  total  data.  In 
fact,  for  most  cases,  the  parameter  estimates  were  updated  Jess  than  8%  of  the  time. 

TABLE  1 


Average  Number  of  Updates  of  SMM(OBE) 


SNRIdBI 

-10 

88.5 

-8 

80.0 

-6 

71.8 

4 

66.3 

68.3 

0 

67.5 

2 

65.7 

4 

45.0 

6 

43.6 

8 

36.2 

10 

29.5 

Case  2) 

Case  3') 

97.7 

76.0 

104.5 

65.8 

77.9 

60.S 

70.2 

58.5 

65.2 

62.2 

52.6 

63.6 

42.4 

53.6 

41.9 

51.5 

42.3 

60.3 

42.7 

60.4 

45.8 

52.3 

The  second  of  the  observations  made  on  SMM(OBE)  was  an  insensitivity  of  the 
operation  of  the  algonthm  with  re^ipect  to  the  choice  of  the  magnitude  bound  on  the  noise, 
{.  This  characteristic  has  also  been  observed  in  other  OBE  algorithms  as  well  [Rao89, 
Sec.3.4].  This  observation  was  made  through  the  following  experiment.  Initially,  the 
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aoise  bound,  y,  was  chosen  to  agree  with  the  noise  level  when  the  SNR  was  0  dB.  This 
value  of  Y  was  used  for  the  SNR’s  ranging  from  -10  to  10  dB.  Note  that  for  low  SNR's 
(<0),  the  noise  bound  at  0  dB  is  not  a  bound  at  ail,  and  at  high  SNR's  (>0),  the  0  dB 
bound  is  too  large,  which  could  degrade  performance.  In  particular,  since  the  noise 
distribution  was  chosen  to  be  uniform  with  zero  mean,  its  magnitude  bound  can  be 
calculated  using  the  formula: 

YSNR(dB>  = 

Note  that  on  the  right  side  of  this  equation,  the  SNR  is  not  in  dB's.  This  calculation  for 
SNR  =  0.  -10.  and  10  dB  yields  Yo=>n«1.73,  Y-io=^=5.48.  and  Yio=Y'03=0.548. 
These  calculations  yield  a  factor  of  VTO^S.lti  underestimation  or  overestimation  for  the 
simulations  using  SNR's  of  -10  and  10  dB,  respectively.  In  spite  of  these  misjudgmcnts 
in  Y.  SMM(OBE)  was  observed  to  perform  virtually  identically  to  when  the  proper  bound 
was  used.  It  thus  appears  that  the  performance  of  SMM(OBE)  is  insensitive  to  the 
accuracy  of  y. 

Practically,  this  is  a  very  important  property  for  an  OBE  algorithm  to  have,  since  it 
is  often  not  possible  to  meet  certain  assumptions  of  any  algorithm  exactly.  It  is  therefore 
cmcial  that  a  deviation  of  the  true  conditions  from  the  ideal  case  does  not  cause  a  complete 
failure  in  performance,  a  robusmess  property.  Thus  it  appears  that  SMM(OBE)  can  be 
described  as  being  robust  with  respect  to  the  choice  of  the  noise  bound  used. 

4.2  A  Proposal  of  Two  New  Output  Error  Adaptive  Algorithms 

The  final  results  of  this  thesis  involve  an  RLS-type  derivanon  of  algorithms  for  use 
with  the  output  error  adaptive  system.  By  utilizing  some  previously  derived  expressions 
for  the  output  error,  oe(.n),  and  its  gradient,  two  algorithms  can  be  derived  which  are 
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identical  to  the  RPE  algonthms  of  Section  3.2,2,  except  for  the  construction  of  the  matrix 
R(n).  Funher  investigations  -  both  simulations  and  analysis  -  are  needed  to  determine  the 
convergence  properties  of  these  algorithms. 

Proceeding  as  in  Section  3.2,  it  is  desired  to  minimize  the  criterion  JLs(n)  of  (3.1 1) 
with  oe(n)  as  the  error  term: 

n 

JLS(n)  =  2oe2(i) 
i=l 


Again,  taking  derivatives  with  respect  to  the  least-squares  estimate  after  n  iterations,  0(n), 
and  setting  to  zero  gives: 


2yoe(i)^  =  0 

d8(n) 


i=l 


(4.2) 


This  expression  will  be  implemented  in  two  different  ways,  giving  rise  to  two  algorithms 
which  are  slightly  different  than  RPE  and  have  an  appearance  similar  to  the  instrumental 
variable  method  [Lj83,Sec.2.2.21,  as  will  be  discussed  in  Section  4.2.3. 


4.2.1  Algorithm  #1 


From  Section  2.1.2,  oe(i)=y(i)-y(i),  where 


y(i)  =  0(i-l)({)oe(i) 


(4.3) 


Now  from  (3.9)  of  Section  3.1.2,  and  assuming  slow  adaptation  of  the  adaptive  filter 
coefficients,  the  expression  for  the  derivative  of  oefi)  (which  is  the  transpose  of  the 
gradient)  is: 


doe(i)  doe(i) 


-I 


d0(n)  d0(n-l)  A(q-l,n-l) 


(4.4) 
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where  the  prime  denotes  autoregressive  filtering  by  the  denominator  polynomial  of  the 
adaptive  filter.  Substituting  (4.4)  and  (4.3)  into  (4.2)  and  dividing  through  by  -2  yields: 

X[y(i)  -  9'^(n)(Poe(i)]9j(i)  =  0 
i=l 


Solving  for  6(n)  gives: 


9(n)  = 


n 

S9oc(i)y(i) 


i=l 


Similar  to  Section  3.2.1,  a  recursive  formulation  can  be  derived,  yielding: 
9(n)  =  0(n-l)  +  R''l(n)(pQg(n)oc(n) 


(4.5a) 


where 

n 

R(n)  =  Z  (p  oe(i)‘P«(i)  = 

Examination  of  (4.5)  shows  this  algorithm  to  be  identical  to  the  RPE  algorithm,  except  for 
the  construction  of  R(n).  Here  both  a  filtered  and  unfiltered  version  of  the  regressor, 
(Poe(n),  is  used. 

4.2.2  Algorithm  #2 

For  this  algorithm,  the  alternative  expression  (2.13)  is  used  for  oe(n)  in  (4.2), 
which  is  repeated  here: 

=T7“T — TT  [y(n)-8T(n-l)(pee(n)] 

A(q-‘,n-lj  A(q-‘,n-l) 
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=  y'(n)  -  9T(n-l)(pgg(n) 


(4.6) 


Here,  again,  a  primed  quantity  stands  for  a  quantity  which  is  autoregressively  filtered  by 
the  denominator  polynomial  of  the  adaptive  filter. 


Proceeding  as  before,  (4.6)  and  (4.4)  are  now  substituted  into  (4.2),  yielding  the 
following  equation: 


y'(i)  -  0'^(n)(pee(i) 


cp>)  =  0 


Solving  for  0(n)  yields: 


9(n)  = 


u 

S  (p'oeCO^PaCn) 


1-1  n 


i=l 


%  y‘(i)‘Poc® 


i=l 


The  recursive  version  of  the  above  expression  for  0(n)  is: 
0(n)  =  0(n-l)  +  R~kn)(Pjjg(n)  oe(n) 


(4.7a) 


where 

n 

R(n)  =  Z  (p  oe(^)q>J(i)  =  R(n-l)  +  <Poe(n)(pJ(n)  (4.7b) 

4.3.3  Discussion  of  the  Algorithms 

An  interesung  characteristic  of  these  algorithms  that  distinguishes  them  from  the 
RPE  method  is  that  they  use  two  different  vectors  m  the  calculation  of  R(n).  This  feature  is 
reminiscent  of  the  insuoimental  variable  method  [Lj83,  Sec.2.2.2].  The  instrumental 
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variable  method  is  an  algorithm  for  adapting  an  equation  error  adaptive  filter,  and  is 
identical  to  either  (4.5)  or  (4.7)  with  the  following  substimtions: 

■  ,  ,  (4.5)  ^  (4.7)  . 

•  ,  (4.5)  (4.7) 

(Pec(n)  4=  (Pee(n)  =>  (Poe(n) 

(4.5)  (4.7) 

oe(n)  4=  ee(n)  =s>  oe(n) 

The  vector  (J(n)  is  called  the  instrumental  variable,  which  can  be  chosen  in  many  ways. 
See  [Lj83,  Sec.2.2.2  and  3.6.3]  for  discussions  on  this  subject. 

A  fmal  comment  regarding  the  algorithm  (4.7)  is  worth  mentioning.  Since  this 
algorithm  uses  both  the  equation  error  regressor  as  well  as  the  usual  output  error  regressor, 
it  would  be  interesting  -  and  certainly  exciting  -  to  see  whether  this  method  exhibits 
characteristics  of  equation  error  adaptive  schemes.  Of  particular  interest  is  whether  this 
algorithm  possesses  a  unimodal  performance  surface  such  as  the  SMM  algorithms,  which 
also  combine  elements  of  equation  error  and  ouq)ut  error  schemes. 
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